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11/3 ,K/1 (Item 1 from file: 94) 

DIALOG ( R) File 94 : JICST-EPlus 

(c)2004 Japan Science and Tech Corp(JST). All rts . reserv, 

04285(300 JICST ACCESSION NUMBER: 99A0734992 FILE SEGMENT: JICST-E 
Explorer for Speech Recognition. Middle ware for speech recognition and 
Information Consumer Electronics. 

HATAOKA NOBUO (1) 

(1) Hitachi, Ltd., Cent. Res. Lab. 

Erekutoronikusu, 1999, VOL. 44, NO. 8, PAGE. 31-34, FIG. 4, TBL.l, REF.4 

JOURNAL NUMBER: F0037AAL ISSN NO: 0421-3513 CODEN: ERKTA 

UNIVERSAL DECIMAL CLASSIFICATION: 681.3:165 681.3.069 681.327.2 

LANGUAGE: Japanese COUNTRY OF PUBLICATION: Japan 

DOCUMENT TYPE: Journal 

ARTICLE TYPE: Commentary 

MEDIA TYPE: Printed Publication 

Explorer for Speech Recognition. Middle ware for speech recognition and 
Information Consumer Electronics. 

ABSTRACT: This paper introduces voice middleware with a speech 

recognition engine having a general-purpose microcomputer as a 
platform, and outlines the application development. This speech 
recognition system is a hidden Markov model (HMM) system of a phoneme 
piece. Noise control... 

...the additive noise as an on-vehicle noise control to obtain a favorable 
result. The speech middleware expands its application range to 
on-board devices such as car navigation, set top terminals, game 
machines, and. . . 

...DESCRIPTORS: middleware ; 



11/3 ,K/2 (Item 2 from file: 94) 

DIALOG (R) File 94 : JICST-EPlus 

(c)2004 Japan Science and Tech Corp(JST). All rts. reserv. 

04284999 JICST ACCESSION NUMBER: 99A0734991 FILE SEGMENT: JICST-E 
Explorer for Speech Recognition. Application to Car Navigation. 

MINOWA TOSHIMITSU (1) 

(1) Matsushita Commun. Ind. Co., Ltd. 

Erekutoronikusu, 1999, VOL. 44, NO. 8, PAGE. 27-30, FIG. 4, TBL.l, REF.6 

JOURNAL NUMBER: F0037AAL ISSN NO: 0421-3513 CODEN: ERKTA 

UNIVERSAL DECIMAL CLASSIFICATION: 681.3:165 

LANGUAGE: Japanese COUNTRY OF PUBLICATION: Japan 

DOCUMENT TYPE: Journal 

ARTICLE TYPE: Commentary 

MEDIA TYPE: Printed Publication 

Explorer for Speech Recognition. Application to Car Navigation. 

. . .ABSTRACT: finding a candidate was adopted as the speech recognition. 

The candidate is searched by a speech recognition engine by speech 
recognition middleware . Among problems of the speech input are 1) 

the recognition rate and response, 2) the... 
...DESCRIPTORS: middleware ; 



9/3,K/l (Item 1 from file: 233) 

DIALOG(R) File 233:Internet & Personal Comp. Abs . 
(c) 2003 EBSCO Pub. All rts . reserv. 

00662448 02WK05-113 

Take charge of custom storage options — FalconStor's IPStor handles 
multiple functions in one package 

Garvey, Martin J 

Information Week , May 13, 2002 , n888 p61, 1 Page(s) 
ISSN: 8750-6874 

Company Name: FalconStor Software 
Product Name: IPStor 3 

... 2000 IPStor virtualization software for any open-systems storage 
architecture. Explains that IPStor creates a layer of middleware 

between servers and storage systems that let customers change levels of 
capacity for applications as needed. Says that IPStor Version 3 gives 
customers new capabilities. Indicates that a Service... 

. . . the features they want running at different times, and administrators 
can turn off the virtualization engine when they are replicating data 
from one source to another. (EPE) 



15/3, K/l (Item 1 from file: 2) 

DIALOG (R) File 2:INSPEC 

(c) 2004 Institution of Electrical Engineers. All rts . reserv. 

6793248 INSPEC Abstract Number: C2001-02-5260S-014 

Title: Portable speech interpreter using speech recognition technologies 
for microprocessors 

Author (s): Obuchi, Y. ; Kitahara, Y. ; Koizumi, A.; Matsuda, J.; Hataoka, 

N. 

Author Affiliation: Central Res. Lab., Hitachi Ltd., Kokubunji, Japan 
Journal: Transactions of the Institute of Electronics, Information and 
Communication Engineers D-II vol.J83D-II, no. 11 p. 2309-17 
Publisher: Inst. Electron. Inf. & Commun. Eng, 
Publication Date: Nov. 2000 Country of Publication: Japan 
CODEN: DTGDE7 ISSN: 0915-1923 

SICI: 0915-1923(200011) J83DII : 11L . 2309 : PSIU; 1- J ' 
Material Identity Number: M973-2000-012 
Language: Japanese 
Subfile: C 
Copyright 2000, IEE 

Abstract: We have developed a portable interpreter system using the 
speech recognition middleware for embedded microprocessors. The user 
interface for the interpretation function includes sentence retrieval 
using keywords. We have developed two stage recognition of important words 
and general words, and a syllable correction function. It has realized 
the voice input of the large... 
...Identifiers: middleware ; 



15/3, K/2 (Item 1 from file: 8) 

DIALOG (R) File 8 : Ei Compendex(R) 

(c) 2004 Elsevier Eng. Info. Inc. All rts. reserv. 

06035857 E.I. No: EIP02156915216 
Title : Human -voice interface 

Author: Yoshida, Kazunaga; Hagane, Hiroshi; Hatazaki, Kaichiro; Iso, 
Ken-Ichi; Hattori, Hiroaki 

Source: NEC Research and Development v 43 n 1 January 2002. p 33-36 

Publication Year: 2002 

CODEN: NECRAU ISSN: 0547-051X 

Language: English 

...Abstract: client-type speech interface and a server-type speech 
interface. The second is a speaker- independent large-vocabulary 
speech-recognition system and a speech-synthesis system, which are 
necessary to realize. . . 

Descriptors: Continuous speech recognition ; Interfaces (computer); 
Internet; Speech synthesis ; Text processing; Mobile telecommunication 
systems; Servers; Electronic mail; Packet networks; Pattern matching; 
Flowcharting; Middleware 



16/3, K/l (Item 1 from file: 2) 

DIALOG (R) File 2:INSPEC 

(c) 2004 Institution of Electrical Engineers. All rts . reserv. 

6793235 INSPEC Abstract Number: B2001-02-6130E-015 , C2001-02-5260S-008 
Title: Robust speech recognition for car environment noise 
Author (s): Kokubo, H.; Amano, A.; Hataoka, N. 

Author Affiliation: Central Res. Lab., Hitachi Ltd., Kokubunji, Japan 
Journal: Transactions of the Institute of Electronics, Information and 
Communication Engineers D-II vol.J83D-II, no. 11 - p. 2190-7 
Publisher: Inst. Electron. Inf. & Commun. Eng, 
Publication Date: Nov. 2000 Country of Publication: Japan 
CO DEN : DTGDE7 ISSN: 0915-1923 

SICI: 0915-1923 (200011 ) J83DII : 11L . 2190 : RSRE; 1-Q 
Material Identity Number: M973-2000-012 
Language: Japanese 
Subfile: B C 
Copyright 2000, IEE 

Abstract: We developed speech recognition middleware for the SuperH 
microprocessor. This middleware provides sophisticated user interfaces 
to car navigation systems. We study speech recognition for car 

environment noise. We propose a noise handling method, spectrum subtraction 
with noisy hidden. . . 

...Identifiers: speech recognition middleware ; 



16/3, K/2 (Item 2 from file: 2) 

DIALOG ( R) File 2 : INSPEC 

(c) 2004 Institution of Electrical Engineers. All rts. reserv. 

6704958 INSPEC Abstract Number: C2000-10-6180N-02 9 
Title: Natural language dialogue for personalized interaction 

Author (s): Zadrozny, W. ; Budzikowska, M. ; Chai, J.; Kambhatla, N.; 
Levesque, S.; Nicolov, N. 

Author Affiliation: IBM Thomas J. Watson Res. Center, Hawthorne, NY, USA 
Journal: Communications of the ACM vol.43, no . 8 p. 116-20 
Publisher: ACM, 

Publication Date: Aug. 2000 Country of Publication: USA 

CODEN: CACMA2 ISSN: 0001-0782 

SICI: 0001-0782 (200008) 43 : 8L . 116 : NLDP; 1-0 

Material Identity Number: C056-2000-008 

U.S. Copyright Clearance Center Code: 0001-0782/2000/0800$5 . 00 
Language: English 
Subfile: C 
Copyright 2000, IEE 

. . .Abstract: end systems must respond in accord, and one solution may be 
found somewhere in the middle ( ware ) . The pragmatic goal of natural 
language (NL) and multimodal interfaces ( speech recognition , keyboard 
entry, pointing, among others) is to enable ease-of-use for users/customers 
in. . . 

16/3 ,K/3 (Item 3 from file: 2) 

DIALOG (R) File 2 : INSPEC 

(c) 2004 Institution of Electrical Engineers. All rts. reserv. 



5726707 



INSPEC Abstract Number: C9712-7445-019 



Title: Voice recognition and voice synthesis technologies for ITS 

Author (s) : Ono, 0. ; Watanabe, T.; Mitome, Y. ; Inagaki, K. 
Journal: NEC Technical Journal vol.50, no. 7 p. 138-42 
Publisher: NEC, 

Publication Date: July 1997 Country of Publication: Japan 

CODEN: NECGEZ ISSN: 0285-4139 

SICI : 0285-4139 (199707) 50: 7L. 138 :VRVS; 1-1 

Material Identity Number: H719-97011 

Language: Japanese 

Subfile: C 

Copyright 1997, I EE 

...Abstract: developing voice recognition and voice synthesis technology 
which will provide a user-friendly, man-machine . interface . NEC has 
developed a voice recognition system which is able to recognize an 
extensive vocabulary and continuous voice recognition , the "bundle 
research" method, to adapt to different acoustic environments. In the area 
of voice . . . 

. . . This technology will be applied to the development of an interactive 
voice system platform. or middleware x for ITS applications, which will be 
an indispensable component of vehicles in the future. 
...Identifiers: middleware ; 



16/3 ,K/4 (Item 1 from file: 8) 

DIALOG (R) File 8 : Ei Compendex(R) 

(c) 2004 Elsevier Eng. Info. Inc. All rts . reserv. 

06948954 E.I. No: EIP04308275658 
Title: It's good to talk 

Author: Anon 

Source: Electronic Product Design v 25 n 6 June 2004. 

Publication Year: 2004 

CODEN: EPDEDB ISSN: 0263-1474 

Language: English 

Abstract: Toshiba is developing speech technology as part of Toshiba 
Research, providing middleware that can be embedded in the company 1 s 
chips, for example the TX RISC processors... 

...on quantum encryption algorithms which has developed both automatic 
speech recognition and test to speech middleware . Toshiba's approach to 
text to speech synthesis, is to code the speech and convert. . . 

Descriptors: Speech recognition ; Speech synthesis ; Personal 
digital assistants; Interfaces (computer); Speech processing; 
Microphones; Communication systems; Computer software; Computer hardware 

16/3 ,K/5 (Item 2 from file: 8) 

DIALOG (R) File 8 : Ei Compendex(R) 

(c) 2004 Elsevier Eng. Info. Inc. All rts. reserv. 

06129951 E.I. No: EI P0237708714 1 

Title: VoiceXML: Enabling voice access to information 

Author: Kemble, Kimberlee 

Source: Communications Solutions v 7 n 1 January 2002. p 54-57 
Publication Year: 2002 



ISSN: 1093-8176 
Language: English 



. . .Abstract: standard that utilizes common programming methodologies to 
enable voice access to critical e-business applications. Middleware 
infrastructure products which utilize VoiceXML, allow customers to access 
vital information quickly, with distinguished ease... 

...Descriptors: communication; Speech recognition; World Wide Web; 
Cellular telephone systems; Java programming language; HTML; Web browsers; 
Graphical user interfaces ; Servers; Speech synthesis ; Database 
systems; Computer keyboards 



16/3, K/6 (Item 3 from file: 8) 

DIALOG (R) File 8 : Ei Compendex(R) 

(c) 2004 Elsevier Eng. Info. Inc. All rts . reserv. 

06073629 E.I. No: EI P02266988543 

Title: Conversational natural language understanding interfacing city 
event information 

Author: Mast, Marion; Ross, Thomas; Schulz, Henrik; Harrikari, Heli; 
Demesticha, Vasiliki; Polymenakos, Lazaros; Vamvakoulas , Yannis; 
Stadermann, Jan 

Corporate Source: IBM European Speech Research, D-69115 Heidelberg, 
Germany 

Source: Data and Knowledge Engineering v 42 n 3 September 2002. p 343-360 

Publication Year: 2002 

CODEN: DKENEW ISSN: 0169-023X 

Language: English 

Descriptors: Natural language processing systems; Speech recognition ; 
Linguistics; Middleware ; Interfaces (computer) ; Information retrieval 



16/3, K/7 (Item 1 from file: 94) 

DIALOG (R) File 94 : JICST-EPlus 

(c)2004 Japan Science and Tech Corp(JST). All rts. reserv. 

03566057 JICST ACCESSION NUMBER: "98A0469620 FILE SEGMENT: JICST-E 
ITS (Intelligent Transport Systems). Element Technologies of ITS. Voice 
Recognition and Voice Synthesis Technologies for ITS. 

ONO OSAMU (1); WATANABE TAKAO (2); MITOME YUKIO (2); INAGAKI KEIKO (2) 
(1) NEC Corp.; (2) NEC Corp. 

NEC Giho (NEC Technical Journal), 1997, VOL. 50, NO. 7, PAGE . 138-142 , FIG. 4, 
REF. 14 

JOURNAL NUMBER: G0475BAB ISSN NO: 0285-4139 

UNIVERSAL DECIMAL CLASSIFICATION: 681.3:165 656.11 

LANGUAGE: Japanese COUNTRY OF PUBLICATION: Japan 

DOCUMENT TYPE: Journal 

ARTICLE TYPE: Commentary 

MEDIA TYPE: Printed Publication 

. . .ABSTRACT: currently developing voice recognition and voice synthesis 

technology which will provide a userf riendly, man-machine interface . 
NEC has recently developed a voice recognition system which is able 
to recognize an extensive vocabulary and continuous voice recognition, 
the "bundle. . . 



.This technology will be applied to the development of an interactive 
voice system platform or middle - ware for ITS applications, which 
will be an indispensable component of vehicles in the future, (author 



16/3, K/8 (Item 1 from file: 144) 

DIALOG (R) File 144: Pascal 

(c) 2004 INIST/CNRS. All rts . reserv. 

14627277 PASCAL No.: 00-0297854 

Sophisticated speech processing middleware on microprocessor 
1999 IEEE 3rd workshop on multimedia signal processing : Copenhagen, 
13-15 September 1999 

HATAOKA N; KOKUBO H; NUKAGA N; OBUCHI Y; AMANO A; KITAHARA Y 

LIU KJ Ray, ed; OSTERMANN Joern, ed; DEPRETTERE Ed, ed; KLEIJN W Bastiaan 

, ed; SORENSEN John Aasted, ed 

Central Research Laboratory, Hitachi Ltd., Kokubunji, Tokyo 185-8601, 

Japan 

IEEE Signal Processing Society, United States 

Workshop on multimedia signal processing, 3 (Copenhagen DNK) 1999-09-13 
1999 691-696 

Publisher: IEEE, Piscataway NJ 
Language: English 

Copyright (c) 2000 INIST-CNRS. All rights reserved. 

Sophisticated speech processing middleware on microprocessor 

This paper describes Speech Processing Middleware which has been 
developed on RISC microprocessors aiming for embedded speech applications. 
This middleware consists of a speech recognition module and a speech 
synthesis module, and especially the speech recognition middleware has 
advantages of robustness for environmental noise and speaker differences. 
The speech middleware provides sophisticated user interfaces to 
multimedia systems using microprocessors as CPUs, such as car navigation. . . 

English Descriptors: Speech processing; Speech synthesis ; User 
interface ; Speech recognition ; Microprocessor; RISC processor; 
Robustness; Multimedia; Software tool; Digital processing; Adaptive 
algorithm; Implementation; Printed circuit board. . . 

16/3, K/9 (Item 1 from file: 233) 

DIALOG (R) File 233: Internet & Personal Comp . Abs . 
(c) 2003 EBSCO Pub. All rts. reserv. 

00423913 96BY05-004 

Make voice response sing — A new generation of programming tools helps 
IVR programmers avoid the panic button 

Linthicum, David S 

BYTE , May 1, 1996 , v21 n5 p53-56, 3 Page(s) 
ISSN: 0360-5280 

... tools. Adds that IVR systems typically use distributed databases on 
servers linked by client/server middleware to help callers perform 
calculations online or talk directly to an application using enhanced 
speech. . . 

Descriptors: Voice Mail; Application Development; Audio Processing; 
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Speech Recognition ; Database; User Interface 
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DIALOG (R) File 2:INSPEC 

(c) 2004 Institution of Electrical Engineers. All rts . reserv. 

02529613 INSPEC Abstract Number: B85057319, C85045944 

Title: Applications of automated speech technology to land-based army 
systems 

Author(s): Chambers, R.M. ; deHaan, H.J. 

Author Affiliation: US Army Res. Inst, for the Behavioral & Social Sci. f 
Alexandria, VA, USA 

Journal: Speech Technology vol.2, no . 4 p. 92-9 
Publication Date: Feb. -March 1985 Country of Publication: USA 
CODEN: SPETDB ISSN: 0744-1355 
Language: English 
Subfile: B C 

Title: Applications of automated speech technology to land-based army 
systems 

Author(s): Chambers, R.M. ; deHaan, H.J. 
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DIALOG ( R) File 350:Derwent WPIX 

(c) 2004 Thomson Derwent. All rts . reserv. 



014376787 **Image available** 
WPI Acc No: 2002-197490/200226 
XRPX Acc No: N02-150031 

Middleware layer for mediation between speech related application 

and engine in computing system, has processing component configured to 

perform speech related services for application and engine 
Patent Assignee: MICROSOFT CORP (MICT ); CHAMBERS R L (CHAM- I); CONNELL E 

W (CONN-I); LIPE R (LIPE-I); SARKAR A (SARK-I); SCHMID P H (SCHM-I); 

CHAMBERS R (CHAM-I); CONNELL E (CONN-I); ELLERMAN E C (ELLE-I) 
Inventor: CHAMBERS R ; CONELL E; LIPE R ; SCHMID P H ; CHAMBERS R L ; 

CONNELL E W ; SARKAR A; CONNELL E ; ELLERMAN E C; ALLEVA F A; HWANG M; 

JU Y 
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Middleware layer for mediation between speech related application 
and engine in computing system, has processing component configured to 
perform speech related services for application and engine 
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. . . LIPE R . . . 

. . . SCHMID P H . . . 

. . . CHAMBERS R L . . . 
. . . CONNELL E W ... 

. . . CONNELL E 

Abstract (Basic) : 

The speech middleware component has application 

independent interface connected to the application, and an engine 

independent interface connected to the engine... 
... a) Multi-process speech recognition middleware layer... 

. . .b) Multi-voice speech synthesis middleware layer. . . 
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Abstract (Basic) : 

... An INDEPENDENT CLAIM is included for speech recognition 

interface . 
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Middleware layer for mediation between speech related application 
and engine in computing system, has processing component configured to 



perform speech related services for application and engine 
Abstract (Basic) : 

... The speech middleware component has application independent 

interface connected to the application , and an engine independent 
interface connected to the engine . The processing component is 
configured to perform speech related services for the application and 
the. . . 

... a) Multi-proces j speech recognition middleware layer. . . 

. . .b) Multi-voice speech synthesis middleware layer... 

...For mediation between application and engines such as speech 

recognizer and speech synthesizer in a computing system, PC, server 
computer, hand-held or laptop... 
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ABSTRACT 



. and voice control services. 



SOLUTION: A PC application 31 uses one call control voice control 
middleware 32 to utilize a call control service and a voice control 

service provided by the CTI system 33. When the PC application 31 makes a 
voice message reproduction request, the call control voice control 
middleware 32 receives this request, allows a voice service terminal 32 

to make a call to. . . 
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. . . phone, involves receiving information request from device and 
detecting device type through single wireless portal middleware 
Abstract (Basic) : 

. . . multiple wireless and wireline devices of varying configurations 

and types through a single wireless portal middleware (WPM) (210) and 
detecting a device type through the WPM. The requested information is 
retrieved. . . 

Used for providing information to wireless and wireline devices 
e.g. voice -only phone, wireless application protocol (WAP) device, 
PDA and a laptop computer... 
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Abstract (Basic) : 

electrical appliances such as car navigation system and also in 
e-mail system, oral statement software , internet applications , 
speech recognition middleware software , word processing software, 
television receiver, air conditioner, used by physically, visually 
disabled persons, elderly people... 
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Automatic telephone exchange and web mailing system using speech 
recognition and method thereof 

Abstract (Basic) : 

... An automatic telephone exchange and web mailing system using 

speech recognition and a method thereof are provided to recognize 
a voice of a caller, automatically exchange a telephone conversation, 
and transmit a voice or a character... 

... judges whether inputted exchange information is a voice (S31). If 

inputted exchange information is the voice , the call controller 
recognizes the voice using an ASR (Automatic Speech Response) 
middleware and converts exchange information into data with a suitable 
other type. An IVR ( Interactive Voice... 
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Middleware layer between speech related applications and engines 

..SPECIFICATION In particular, the present invention relates to a 
middleware layer which resides between applications and engines (i.e., 
speech recognizers and speech synthesizers) and provides services, on an 
application-independent and engine-independent basis, for both 
applications and engines . 

Speech synthesis engines typically include a decoder which receives 
textual information and converts it to audio information which... 

..The present invention provides an application-independent and 
engine-independent middleware layer between applications and engines . 
The middleware provides speech -related services to both applications 
and engines, thereby making it far easier for application vendors... 



.to consumers. 

In one embodiment," the middleware layer provides a rich set of services 
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Middleware layer between speech related applications and engines 

...SPECIFICATION The present invention provides an application-independent 
and engine-independent middleware layer between applications and engines 
. The middleware provides speech -related services to both 
applications and engines, thereby making it far easier for application 
vendors . . . 

. . . to consumers . 

In one embodiment, the middleware layer provides a rich set of services 
between speech synthesis applications and synthesis engines. Such 
services include parsing of input data into text fragments, format 
negotiation. • . 

...application, multivoice mixing processes. 

In yet another embodiment, the invention includes a middleware 
component between speech recognition applications and speech 
recognition engines . In such an embodiment, the middleware layer 
illustratively generates a set of COM objects which configures the 
speech recognition engine , handles event notification and enables 



interface configured. 



. . . engine . 

11. The middleware layer of claim 1 wherein the engine comprises a 

text-to- speech (TTS) engine and wherein the processing component 
comprises: 

a first object having an application interface and an. . . 

...by the engine to begin synthesis. 

16. The middleware layer of claim 1 wherein the engine comprises a 
speech recognition (SR) engine and wherein the processing 
component comprises: 

a first object having an application interface and an engine interface. 

17. The middleware layer of claim 16 wherein the application interface 
exposes a method configured to receive recognition attributes from 
the application and instantiate a specific speech recognition 
engine based on the engine attributes received. 

18. The middleware layer of claim 16 wherein the... engine interface on 
the site object is configured to receive result information from the 
SR engine indicative of recognized speech . 

38. The middleware layer of claim 36 wherein the engine interface on the 
site ob j ect . . . 

. . . application . 

40. A multi-process speech recognition middleware layer configured to 

facilitate communication between a speech recognition (SR) engine 
and one or more applications, the middleware layer comprising: 
a first process including: 

a first context object having an application interface to enable 

application control of a first plurality of attributes of the speech 
recognition and an engine interface ; and 

a first grammar object having an application interface and an engine 
interface and storing... 

...an application interface to enable application control of a first 

plurality of attributes of the speech recognition and an engine 
interface ; and 

a second grammar object having an application interface and an engine 
interface and storing. . . 

...configured to facilitate communication between one or more applications 
and a plurality of text-to- speech (TTS) engines , comprising: 
at least a first voice object having an application interface configured 
to receive TTS . . . 
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Claims 

Detailed Description s 

. of information presentation and feedback from the user; multi-modal 
input ability including ffill duplex voice recognition software for 
both command and navigation purposes as well as natural language 
processing for notation and. . . 

...the performance support system referred to as a knowledge store; an 
advanced Web server-based middleware that assembles and delivers, at 
high speed, data from the object-oriented database and delivers ... and 
audio output device that for example, can be a headphone/microphone 
combination 260, a speech recognition module 270, and a manual 
interface , that can include a keypad, keyboard, slide, knob, button, 
switch, touch screen, and the like... 

Claim 

... of data objects compris es text. 

10 The system of Claim 1, wherein the user interface comprises: 
n-dcrophone configured to accept user voice commands; and 
speech recognition module configured to convert the user voice 
commands to electronic requests that are provided to the processor. II. 
The system of Claim. . . 
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Biometric identification - the analysis of biological characteristics to 
verify individuals identify (e.g., fingerprints, voice recognition , 
retinal scans ) . 

Related to authentication, non-repudiation is a means of tagging a 
message in. . . 

...Control for Windows 95; SecurlD; Racals TrustMe Authentication Server; 
Visionics Facelt; Sensars Irisldent; Keyware Technologies Voice 
Guardian; National 

Registrys NRIdentity ; " Kerberos ; VeriSip 

The following are examples of products that perform authentication. . . 
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Racal's TrustMe Authentication Server 
biometric security 

Visionics 1 Facelt - face recognition 
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Keyware Technologies ' Voice Guardian - voice recognition 
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Kerberos - an encryption and key management protocol for third party 
authorization; vendors... 
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Biornetric identification - the analysis of biological characteristics to 
verify individuals identify (e.g., fingerprints, voice recognition , 
retinal scans ) . 
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Related to authentication, non-repudiation is a means of tagging a 
message... of packetized data, Circuit Switching services establish 
physical circuits for the transfer of circuit-switched voice , fax, 
video, etc. 

Circuit Switching 

uses an end-to-end physical connection between the sender... 



14/3 ,K/7 (Item 6 from file: 349) 

DIALOG (R) File 34 9:PCT FULLTEXT 

(c) 2004 WIPO/Univentio. All rts . reserv. 

00784134 

A SYSTEM, METHOD AND ARTICLE OF MANUFACTURE FOR A CONSTANT CLASS COMPONENT 

IN A BUSINESS LOGIC SERVICES PATTERNS ENVIRONMENT 
SYSTEME, PROCEDE ET ARTICLE MANUFACTURE UN COMPOSANT DE CLASSE DE CONSTANTE 

DANS UN ENVIRONNEMENT DE SCHEMAS DE SERVICES DE LOGIQUE D'AFFAIRES 



14/3, K/8 (Item 7 from file: 349) 

DIALOG ( R) File 34 9:PCT FULLTEXT 

(c) 2004 WIPO/Univentio. All rts . reserv. 

00784132 

A SYSTEM, METHOD AND ARTICLE OF MANUFACTURE FOR A LEGACY WRAPPER IN A 

COMMUNICATION SERVICES PATTERNS ENVIRONMENT 
SYSTEME, PROCEDE ET DISPOSITIF POUR MODULE D 1 HABILLAGE EXISTANT DANS UN 

ENVI RONNEMENT DE SCHEMAS DE SERVICES DE COMMUNICATION 

Patent Applicant/Assignee: 
. ACCENTURE LLP, 1661 Page Mill Road, Palo Alto, CA 94304, US, US 
(Residence), US (Nationality) 
Inventor (s ) : 

BOWMAN -AMU AH Michel K, 6426 Peak Vista Circle, Colorado Springs, CO 80918 
, US, 

Legal Representative: 

HICKMAN Paul L (agent), Oppenheimer Wolff & Donnelly, LLP, 1400 Page Mill 
Roadast, Palo Alto, CA 94304, US, 
Patent and Priority Information (Country, Number, Date) : 

Patent: WO 200116724 A2-A3 20010308 (WO 0116724) 

Application: WO 2000US24084 20000831 (PCT/WO US0024084) 

Priority Application: US 99386834 19990831 
Designated States: 

(Protection type is "patent" unless otherwise stated - for applications 
prior to 2004) 

AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CU CZ DE DK DZ EE ES FI GB 
GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK 
MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN 
YU ZW 

(EP) AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE 

(OA) BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG 

(AP) GH GM KE LS MW MZ SD SL SZ TZ UG ZW 

(EA) AM AZ BY KG KZ MD RU TJ TM 
Publication Language: English 
Filing Language: English 
Fulltext Word Count: 150947 

Fulltext Availability: 
Detailed Description 

Detailed Description 

. . . Dynamics 1 SecurlD Authentication Tokens 
167 

Racal's TrustMe Authentication Server 
biornetric security 

Visionics' Facelt - face recognition 
Sensar's Irisldent - iris identification 

Keyware Technologies ' Voice Guardian - voice recognition 
National Registry's NRIdentity - fingerprint recognition 
keys and certificates 

Kerberos - an encryption and key management protocol for third party 
authorization; vendors. . . Packetized 

transferred through brief, temporary, logical connections between nodes 
includes data and packetized multimedia (video, voice , fax, etc.) 
Circuit Switching includes the following functionality. 



establishes end-to-end path for circuit. 
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uses an end-to-end physical connection between the sender. 
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said prosodic data the parameters of which have been changed by the 
parameter ... of said constraint information, responsive to the emotion 
state discriminated by said discriminating means; and 
speech synthesizing means for synthesizing the speech based on 
said prosodic data, the parameters of which have been changed by the 
parameter . . . 
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The user speech recognition unit 103 detects a sum of four 
emotions, namely the joy/pleasure ( JOY/ PLEASURE ) , sorrow... 
...Conference), a Naive Bayes classification algorithm), is here used as an 
example . 

Specifically, the user speech recognition unit 103 includes a 
speech input unit 111, a characteristic value extraction unit 112, a... 

...space, based on the sort and the likelihood of the emotion detected by 
the user speech recognition unit 103 and/or the user image 
recognition unit 104 and supplied from the short.'. . 

...user's current feeling. 

Meanwhile, if the likelihood of the emotion detected by the user 
speech recognition unit 103 differs from the likelihood of the emotion 
detected by the user image recognition... 

...using an average value of the two likelihoods.. If the emotion detected 
by the user speech recognition unit 103 differs from the emotion 
detected by the user image recognition unit 104, the... 

...CLAIMS apparatus as said target status. 

26. The robot apparatus according to claim 24 further comprising: 
speech recognition means and picture recognition means; 
said status detection means detecting the feeling status as said. . . 

...used as said target status. 

34. The method according to claim 32 further comprising: 
a speech recognition step and an image recognition step; 
said status detection step detects the feeling status, as... 
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...SPECIFICATION by color segmentation. 

Fig. 20 is a block diagram showing a constituent unit which achieves 
recognition of the uttered speech . 

Fig. 21 shows an illustrative structure of HMM for inputting unknown 
languages . 

Fig.22shows the results of speech recognition . 

Fig. 23 shows the information pertinent to the internal state. 

Fig. 24 shows the relationship ... obj ects stated in a connection file 



stored in a memory card 28 (Fig. 2). 

The middle ware layer 40 is positioned as an upper layer of the 
robotics server object 32, and. . . 

...software items providing basic functions of the robot apparatus 1, such 
as picture processing or speech processing. The application layer 41 
is located as an upper layer of the middle ware layer 40, and is a 
set of software items for deciding on the behavior of... 

. ..1 based on the results of the processing by the software items making up 
the middle ware layer 40. Fig. 4 shows specified software structures 
of the middle ware layer 40 and the application layer 41. 

Referring to Fig. 4, the middle ware layer... to find parameters 
therefor. The Hidden-Markov-Model (HMM) , which is currently the 
mainstream in speech recognition , as later explained, is the 
technique of recognition which also belongs to this category. The.;. based 
on the picture information, and the sound perception part 125 is a part 
for speech recognition responsive to the speech input from a 
microphone. The following explanation is made on the processing carried 
out by. . . 

...the shape analysis of an object as cropped by color segmentation. 

(4-2-1-3) Speech recognition 

As the speech recognition , the continuous speech recognition 

employing the HMM is used. This technique may be exemplified by that 
proposed in the... that 'booru' (ball) 1 is correctly acquired by 
unknown- 1 . 

Moreover, since HMM is capable of recognizing the continuous speech 

, for the seventh speech from the top in Fig. 22 the symbol 'kere 1 can be 



...next to the label unknown-1 for the previously acquired label unknown-1. 

In such speech recognition system, if a noun 1 booru 1 is acquired, 
the robot apparatus 1 is able to kick... of this?', repeatedly confirms 
any input phoneme sequence that is effective for outputting from the 
speech recognition unit. 

On the other hand, if the acquisition behavior by the effect on changes 
in. . . 
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. . .ABSTRACT Al 

A robot (1) is proposed which includes a speech recognition unit 
(101) to detect information supplied simultaneously with or just before 
or after detection of... 



result ... each of channels including a color recognition unit 201, shape 
recognition unit 202, and a speech recognition unit 203, and for 
example a binary ID (identification information) is appended to each of 



...short-term memory 211 of an associative memory 210. A speech prototype 
ID from the speech recognition unit 203 is passed through a semantics 
converter (SC) 204 to the short-term memory. . . 

. . .by a color segmentation module and supplies the data to the associative 
memory 210. The speech recognition unit 203 outputs a prototype ID of 
a word uttered by the user or trainer ... designed for minimum action based 
on an ethological model, switching to a tree using a speech 
recognition and an operation test, and a test for the learning. The 
operation test is made. . . 
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Claim 

... as a keyboard, a mouse, a microphone, a pen, a biometric input device, 



such as voice recognition device, etc. The output device 270 may 
include any conventional mechanism or combination of 
6. . . 



...in real time, obtain real time billing, information, and generate 
reports using a rules-centric middleware core. In one embodiment, a 
customer may perform these functions through a single point of . . . 

...10 uses a Common Object Request Broker Architecture (CORBA) based 
publish@and-subscribe messaging middleware to integrate the different 
components of the OSS 130. Other techniques for integrating the different 
...unit 350 includes an extensible Program Management (XPM) unit 6 10, 
one or more voice portal application servers 620, and a customer 
directory database 630. The XPM unit 610 receives user profile... 

...the process management system 3 10 and stores this information for use 
by the voice portal application servers 620. The XPM unit 610 may 
also receive other information, such as information identifying... 

...etc.) by which the customer wishes to receive the service(s) provided. 
16 

[00631 The voice portal application servers 620 may include one or 
more servers that interact with the XPM unit 61... 

...distribution and shipping services, insurance services, health and 
pharmaceutical services, manufacturing services, and the like. Voice 
portal application servers 620 may also provide data collection unit 
336 with information regarding what services are. . . 

. . .may then pass this information on to the billing unit 337 for billing 
purposes. The voice portal application servers 620 may be located at 
the OSS 130 or distributed throughout the network I 10. The customer 
directories 630 may store information relating to the services provided 
by the voice portal application, servers 620. For example, the 
customer directories 630 may store stock quotes, current weather 
f orecasts . . . the OSS 130 offers a unique combination of products and 
services (e.g., billing, reporting, voice portal applications , VoIP 
services, etc.). In addition to the user having to login (or register) 
with the . . . 
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allowing simultaneous multi application 
access ) 

Intelligent Peripheral (Media Control) 

Provides services such as DTMF parsing, Voice prompting, Messaging, 
Speech recognition , Text to Speech , Text to Fax, etc. 
Protocol Conversion (Policy Management) 
Receives session requirements from Rules database 

Selects ... Kbps (thousand bits per second). This rate is not the rate 
required to send digitized voice per se. Rather, 64Kbps is the rate 
required to send voice digitized with the Pulse Code Modulated (PCM) 
teclinique. Many other methods for digitizing voice exist... 
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. . . allowing simultaneous multi application 
access ) 

Intelligent Peripheral (Media Control) 

- Provides services such as DTMF parsing, Voice prompting, Messaging, 
Speech recognition , Text to Speech , Text to Fax, etc. 

Protocol Conversion (Policy Management) 

- Receives session requirements from Rules database 

64... call 3602 by a switch 1206-1210 detennines if the call 3602 is an 
enhanced voice service/network audio response system (EVS/NARS) call. 
An EVS/NARS is an audio menu. . . 
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knowledge and background of the individual maintaining and developing 
a software application, maintenance of the software application is 
highly - Idependent on the quality of the application and also on the 
detailed documentation of the various... 
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Claims 

English Abstract 

In an inventory access system, an integrated voice recognition module 
with a speech processor is used to interface a user with an 
intelligent switch having access to databases containing profiling 
information about travelers... 

Detailed Description 

with web browsers or a I 0 Wireless-Application-Protocol 
( "WAP" ) -enabled device) interface via middleware programming with the 
GDSs and CRSs . The middleware allows communication between the client 
device and the GDSs and CRSs, notwithstanding that each of these 
components might use different protocols. The middleware thus often is 
referred to as a "booking engine" or a 
"switching engine." 
5 The . . . 

...information such as travel profiles, business rules, and quality control 
and accounting criteria, and a middleware switching engine that can 



J 



accommodate the multiple protocols of ' the telephony components and the 
database . . . 

...with any applicable business rules or accounting procedures. 

At the heart of the system, a middleware switch engine 40 is provided 
that is capable, on the one hand, of accepting and. . . 

Claim 

. . . that transmits data to and receives data from a user 
via a telephone; 

(b) a speech synthesis module that translates information received 
from the user 

interface ; 

(c) an inventory interface that transmits data between the inventory 
access system and one or. . . 

...wherein the inventory databases containing information about items in an 
inventory, and wherein the inventory interface communicates translated 
instructions from the speech synthesis module to the inventory 
databases; (d) a library containing one or more library databases, 
wherein ... 

. . . fleet. 

4 The inventory access system of claim 1, wherein the at least one user 
interface is an integrated voice recognition module. . A method for 
facilitating voice -activated inventory access using a system that has 
automated primary capabilities and live assistance capabilities... 
. . .module, wherein the switch module directs and controls the flow of 
information between the inventory interface , the library, and user 
interface via the voice 

recognition module; 
wherein the user selects a system transaction from a menu in the library, 
is . . . 
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Detailed Description 

The elements of voice portal subsystem are as shown in Figure 6: a 
telephone system interface 220, a speech recognition module 240, a 
TTS module 250, a touch-tone module 260, and an audio 1 . . . to her over a 
browser 620. 

Returning now to Figure 6, in this exemplary implementation, middleware 
in the form of Netegrity's SiteMinder product suite is used to abstract 
the policy. . . 
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capability. 

In another aspect, the invention provides an operating model for a 
telephonebased audio or speech recognition and text to voice 
interfaced goods and services information, enhanced 41 1 type directory 
assistance and referral service having merchant .. .Menlo Park, CA 94025. A 
VOS-Nuance integration Run-Time Link Library (RLL) provides an interface 

between the Parity VOS software and the Nuance speech recognition 
system. An embodiment of a VOS-Nuance Integration Run-Time Link Library 
(RLL) is available... 

...application specific program modules used for the inventive Talk411 
system and method include a Nuance voice recognition interface 
module and a SpeechPro module which converts numeric, date and currency 
values to speech. Other... is anticipated that Version 8 will be released 
shortly. 

Version 8 may provide some additional voice processing or recognition 
capability, or provide an interface or other integration with Nuance 
Voice Recognition (and possibly including others, such as SpeechWorks 
software). The interface software is presently a piece ... service centers, 
and the like as are known in the telephony and server technologies. The 
Middleware 474 is the layer of software that integrates - 63 operations 
of the Web Server, LDAP ... implemented from an Internet or 

other networked computing environment using conventional keyboard, mouse, 
and display interfaces or with the addition of voice input and 
speech - recognition capability. 

Therefore it should be understood that the system and method described 
relative to the. . . 

Claim 

database and for retrieving and editing said information, at least a 
component of said merchant interface comprising a voice - recognition 

interface and an internet interface ; and a consumer interface for 
inputting voice commands and data having a voice I 0 recognition 
component and for receiving merchant information and processed 
information from said database in response to... 
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Linux on the other computers 10-60, and the networking software may be 
Novell Netware. 

Middleware software 200 
Communicating through the networking software is a so-called 
11 middleware " program layer 200 on each of the computers 10 The function 
of the " middleware " is to allow different programs on different . 
computers to share information and call functions on each other. In this 
embodiment, the " middleware " layer 200 is provided by Object Request 
Brokers (ORB) consistent with the Common Object Request... 9 1997 and also 
available at 

http : //www. cs . wusti . edu/ ' schmidt/ oops i a . html 

The middleware software 2 00 provides at least one event channel 116 
for each applications program. .. another program. The format of the event 
data for each program is defined by the interface description of that 
program. 

The speech synthesis program 126 is provided with two event queues; 
an 

input event channel 152 at which. . . 

...0 is arranged to send messages to, 

and receive messages from, the graphical user interface ( GUI ) program 
120; the audio interface program 122; the speech recogniser program 
124; the text speech program 126; and, optionally (not shown in Figure 



File EBusiness & Industry(R) Jul/ 1994-2004/Sep 17 

(c) 2004 The Gale Group 
File 15:ABI/Inform(R) 1971-2004/Sep 18 

(c) 2004 ProQuest Inf o&Learning 
File 16: Gale Group PROMT (R) 1990-2004/Sep 20 

(c) 2004 The Gale Group 
File 20:Dialog Global Reporter 1997-2004/Sep 20 

(c) 2004 The Dialog Corp. 
File 47: Gale Group Magazine DB (TM) 1959-2004/Sep 20 

(c) 2004 The Gale group 
File 75:TGG Management Contents (R) 86-2004/Sep W2 

(c) 2004 The Gale Group 
File 80:TGG Aerospace/Def .Mkts (R) 1986-2 004/Sep 20 

(c) 2004 The Gale Group 
File 88:Gale Group Business A.R.T.S, 1976-2004/Sep 17 

(c) 2004 The Gale Group 
File 98:General Sci Abs/Full-Text 1984-2004/ Jul 

(c) 2004 The HW Wilson Co. 
File 112:UBM Industry News 1998-2004/ Jan 27 

(c) 2004 United Business Media 
File 141: Readers Guide 1983-2004/ Jul 

(c) 2004 The HW Wilson Co 
File 148: Gale Group Trade & Industry DB 1976-2004/Sep 20 

{c)2004 The Gale Group 
File 160:Gale Group PROMT ( R) 1972-1989 

(c) 1999 The Gale Group 
File 275: Gale Group Computer DB (TM) 1983-2004/Sep 20 

(c) 2004 The Gale Group 
File 264: DIALOG Defense Newsletters 1989-2004/Sep 20 

(c) 2004 The Dialog Corp. 
File 484 : Periodical Abs Plustext 1986-2004/Sep W2 

(c) 2004 ProQuest 
File 553: Wilson Bus. Abs. FullText 1982-2004/ Jul 

(c) 2004 The HW Wilson Co 
File 570:Gale Group MARS (R) 1984-2004/Sep 20 

(c) 2004 The Gale Group 
File 608:KR/T Bus. News. 1992-2 004/Sep 20 

(c)2004 Knight Ridder/Tribune Bus News 
File 620:EIU:Viewswire 2004/Sep 17 

(c) 2004 Economist Intelligence Unit 
File 613: PR Newswire 1999-2 004/Sep 20 

(c) 2004 PR Newswire Association Inc 
File 621:Gale Group New Prod . Annou . ( R) 1985-2004/Sep 20 

(c) 2004 The Gale Group 
File 623: Business Week 1985-2004/Sep 17 

(c) 2004 The McGraw-Hill Companies Inc 
File 624: McGraw-Hill Publications 1985-2004/Sep 17 

(c) 2004 McGraw-Hill Co. Inc 
File 634: San Jose Mercury Jun 1985-2004/Sep 18 

(c) 2004 San Jose Mercury News 
File 635: Business Dateline (R) 1985-2004/Sep 18 

(c) 2004 ProQuest Inf o&Learning 
File 636: Gale Group Newsletter DB (TM) 1987-2004/Sep 20 

(c) 2004 The Gale Group 
File 647: CMP Computer Fulltext 1988-2004/Sep W2 

(c) 2004 CMP Media, LLC 
File 696: DIALOG Telecom. Newsletters 1995-2004/Sep 20 

(c) 2004 The Dialog Corp. 
File 674: Computer News Fulltext 1989-2004/Aug W4 

(c) 2004 IDG Communications 



File 810:Business Wire 1 986-1999/Feb 28 

(c) 1999 Business Wire 
File 813: PR Newswire 1987-1999/Apr 30 

'(c) 1999 PR Newswire Association Inc 
File 587:Jane's Def ense&Aerospace 2004/Aug W4 

(c) 2004 Jane's Information Group 



Set 


Items 


Description 


SI 


129240 


MIDDLEWARE??? OR MIDDLE () WARE? ? 


S2 


3376 


SI (3N) LAYER? 


S3 


171841 


(SPEECH?? OR VOICE??) (3N) (APPLICATION?? OR SOFTWAR??) 


S4 


11429 


(SPEECH?? OR VOICE??) (3N) ENGINE?? 


S5 


2 


(SOFTWARE?? OR APPLICATION?? OR ENGIN??) (3N) ( IDEPENDENT? ? ) 


S6 


328610 


(SOFTWARE?? OR APPLICATION?? OR ENGIN??) (3N) (INTERFACE?? OR 
GUI?? OR GRAPHICAL () USER () INTERFACE?) 


S7 


7241 


(SPEECH?? OR VOICE??) (3N) (RECOGNI????? OR SYNTHES????) (3N)- 
( INTERFACE?? OR GUI?? OR GRAPHICAL () USER () INTERFACE? ) 


S8 


253410 


API OR APPLICATION () PROGRAM???? () INTERFACE? 


S9 


534 


S2 AND (COUPL???? OR BETWEEN OR NEGOTIAT? ? ?? OR LINK??? OR 




MEDIAT? ? ? ? ) AND APPLICATIONS AND ENGINE? ? 


S10 


821 


AU=(SCHMID P? OR SCHMID, P? OR LIPE R? OR LIPE, R? OR CHAM- 




BERS R? OR CHAMBERS , R? OR CONNELL- E? OR CONNELL, E? ) 


Sll 


0 


S10 AND SI 


S12 > 


0 


S10 AND (S3 OR S4) 


S13 


593 


SI (S) (S3 OR S4) 


S14 


0 


SI AND S5 


S15 


2 


S5 AND (S3 OR S4) 


S16 


33 


S13(S)S7 


S17 


14 


RD (unique items) 


S18 


212 


S13(S) ((SPEECH?? OR VOICE??) (3N) (RECOGNI? OR SYNTHES?)) 


S19 


27 


S18(S) (INDEPENDEN? OR GENERAL?) 


S20 


21 


RD (unique items) 


S21 


17 


S20 NOT (S15 OR S16) 



15/3, K/l (Item 1 from file: 16) 

DIALOG (R) File 16: Gale Group PROMT (R) 

(c) 2004 The Gale Group. All rts. reserv. 

03746680 Supplier Number: 45318475 (USE FORMAT 7 FOR FULLTEXT) 
IBM Begins Testing PowerPC 
Inf ormationWeek, pl5 
Feb 6, 1995 

Language: English Record Type: Fulltext 

Document Type: Magazine/ Journal ; Tabloid; General Trade 

Word Count: 278 

PCs and its OS/2 for PowerPC operating system; both slated for 
late-spring release. Idependent software vendors (ISVs) included in the 
first tests of OS/2 for PowerPC say they are... 

...with 16 Mbytes of RAM, a CD -ROM drive, stereo jacks, and dictation and 
navigation speech -recognition software , according to testers. 

One disappointment: The rumored 615 PowerPC chip, which was reported 
to run. . . 
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PCs and its OS/2 for PowerPC operating system; both slated for 
late-spring release. Idependent software vendors (ISVs) included in 
the first tests of OS/2 for PowerPC say they are... 

with 16 Mbytes of RAM, a CD-ROM drive, stereo jacks, and dictation 
and navigation speech -recognition software , according to testers. 

One disappointment: The rumored 615 PowerPC chip, which was reported 
to run. . . 
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...ABSTRACT: the first public beta of Speech Server will ship with Beta 3 
of Microsoft's Speech Application SDK ( Software Development Kit) in 
what signals speech technology's return to the corporate agenda. Due for 
manufacturing release before mid-2004, the product will include a text-to- 
speech engine from SpeechWorks - Microsoft's own speech - recognition 
engine - and a telephony interface manager. The offering will also 
include middle - ware that is being designed in partnership with Santa 
Clara, Calif. -based Intel and Dallas-based... 

... the Microsoft product to an enterprise telephony infrastructure. But it 
is the server's SALT ( Speech Application Language Tags) voice browser 
that sets Microsoft apart from the standards crowd. Rather than adhering to 
VXML (Voice... 

...TEXT: 2004, the product will include a text-to-speech engine from 
SpeechWorks - Microsoft's own speech - recognition engine - and a 
telephony interface manager. The offering will also include middle - 
ware that is being designed in partnership with Santa Clara, Calif. -based 
Intel and Dallas-based. . . 
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content . 

As part of this agreement, VoiceGenie plans to integrate with the 
IBM's WebSphere Voice Application Access (WVAA) solution, middleware 
software that extends ...Providing a voice portal framework that allows 
enterprises to deliver information to mobile employees using voice as the 

interface , its voice recognition technology can link telephone 
callers to enterprise data and applications previously accessible only via 
computer. . . 
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... in Heidelberg in 1997 and develops, produces and sells a middleware 

platform for building extensive speech automation applications under 
the brand name robot 5. The platform enables rapid development of speech 
recognition and speech synthesis applications by way of a structured 

graphical user interface . All voice robots products are supported 
by a full range of services, including custom development, training, 
maintenance. . . 
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... Version 8.0 platform, Vonetix is voice and wireless interface 

middleware that connects databases, enterprise software packages, speech 
recognition engines and wireless mobile- commerce interfaces . Vonetix 
helps eliminate the need for companies to develop separate back-end 
interfaces to applications... 
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... LumenVox.com Press Area URL: www.LumenVox.com/Press LumenVox ! s 
award-winning, suite of speech recognition software includes the Speech 
Driven Information. System (SDIS), a GUI toolkit wrapped around the 
Speech Recognition Engine (SRE), a low level API designed to slip 

into any telephony application , and LV Speech Tuner a complete 
maintenance tool to improve recognition of any speech application . 
LumenVox is a speech recognition company with over a decade of telephony 
experience. Technology is based of many best... 



, . . their suite of software and worldwide partners, they can design, 

develop, deploy, and maintain any speech application . Company: 

Magnetek, Inc. Ticker Symbol: MAG Booth/Stand: 27720 Media Contact: Melbie 
Vinson, 972-484 . . . 
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... speech automation applications under the brand name robot 5. The 

platform enables rapid development of speech recognition and speech 
synthesis applications by way of a structured graphical user 

interface . All voice robots products are supported by a full range of 
services, including custom development. . . 
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technologies will also be showcased, including: e-commerce 
applications, software and middleware providers, wireless components, 
speech recognition software , mapping software , users interface 

designs, network carriers and operating system vendors. 

For more information on Mobile Commerce at eBusiness... 
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will be showcased as well including: e-commerce applications, 
software and middleware providers, wireless components, speech 
recognition software , mapping software , users interface designs, 

network carriers and operating system vendors. Mobile Commerce will also 
attract wireless service providers... 
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... broad-based technologies that include: e-commerce applications, 

software and middle-ware providers, wireless components, voice 
recognition software , location software , user interface designers, 
network carriers, and operating system vendors. The show will also attract 
such vendors as... - 
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. . . Commerce applications also incorporate a common set of business 
rules, application programming interfaces and transaction middleware , 
which are shared between speech and Web applications . The database 
makes up the third tier of a V-Commerce application. The V- Commerce... 

. . .Commerce Application Development Three new technologies will enable easy 
and rapid development of V- Commerce applications : * Java and ActiveX 
Speech APIs -- Nuance recognition and verification functionality will be 
accessible through Java and ActiveX APIs enabling. . . 

...using popular Java and ActiveX development environments. * SpeechOb j ects 
Nuance SpeechOb j ects are a set of reusable speech application 



components designed to speed the development of speech systems. Using the 
API's described above... 

... recognition easy for all developers. Visual Basic, Java and C++ 
programmers can now easily integrate speech applications into existing 
systems without a lengthy learning curve. SpeechOb j ects are designed to be 
portable so. . . 

... sites. VoxML, used in conjunction with SpeechOb j ects , enables HTML 
developers to easily extend their Web applications to incorporate speech 
recognition. About the V-Commerce Alliance The V-Commerce Alliance 
includes leading technology software, services... 

... working with Nuance to build and deploy V-Commerce applications. In 
addition to natural language speech recognition, V-Commerce applications 
will typically include e-commerce and application servers, packaged 
applications and telephony hardware and software... 
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...s Version 8 CONVERSANT System for interactive voice response (IVR) . 
Vonetix is voice and wireless interface middleware that connects 
databases, enterprise software packages, speech recognition engines 
and wireless, mobile commerce interfaces . The integration is expected to 
benefit Avaya customers who implement the network-based integrated solution 
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. . .VoiceXML Gateway 

software on IBM middleware, the new solution is designed to allow customers 
to 

voice -enable existing web applications via IBM's WebSphere Application 



Server 

(WAS ) and will enable any IBM WebSphere Application Server... 
. . , content . 

As part of this agreement, VoiceGenie plans to integrate with the IBM 1 
WebSphere Voice Application Access (WVAA) solution, middleware 
software that 

extends the WebSphere portal infrastructure and programming model to voice 
Providing a voice portal framework that allows enterprises to deliver 
information to mobile employees using voice as the interface , its 
voice 

recognition technology can link telephone callers to enterprise data and 
applications previously accessible only via computer... 

...Forum, VoiceGenie and IBM have 

always demonstrated industry foresight in promoting low-cost, 
open- standards 

speech applications . Today, we announce the exciting PervasiveGenie 
speech 

platform that will allow VoiceGenie and IBM to aggressively increase its 
leadership in the voice applications space, " said Stuart Berkowitz, 
President 

and CEO, VoiceGenie Technologies. "VoiceGenie is committed to delivering 
market. . . 

...solutions to our global enterprise and telecom service 

provider customers. With integration to IBM's speech recognition and 

middleware software , future collaboration in the area of multimodal 
applications, as well planned increased coverage of our... 
...and 2.0 specifications, allows 

enterprises and telecom carriers to develop and deploy sophisticated IVR 
applications , speech -enabled services, and voice portals. Engineered 
within a 

open standards-based architecture with high levels... 

. . .performance, and reliability, the platform has provided the 
infrastructure for the development of hundreds of speech applications 
worldwide and is currently answering millions of calls for customers each 
day . 

The solution will... 
...applications more quickly and simply running 

VoiceGenie 's VoiceXML Gateway software on standards-based IBM middleware 
it 

said Rodney Adkins, General Manager, IBM Pervasive Computing Division. 
Additionally, VoiceGenie will be making a... 
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As part of this agreement, VoiceGenie plans to integrate with the 
IBM's WebSphere Voice Application Access (WVAA) solution, middleware 
software that extends the WebSphere portal infrastructure and programming 
model to voice. Providing a voice portal framework that allows enterprises 
to deliver information to mobile employees using voice as the interface 
, its voice recognition technology can link telephone callers to 
enterprise data and applications previously accessible only via computer. . . 
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Voice-activated services gaining attention 
SPEECH RECOGNITION: VXML, SALT aid development. 

Byline: STEPHEN LAWSON 

Journal: Network World Page Number: 28 

Publication Date: August 11, 2003 
Word Count: 1255 Line Count: 113 

Text: 

. . , HTML applications it uses on the Web for selected services and let MCI 
create an interface between those applications and a speech 

recognition system. Now when club members move they can enter a change of 
address without using... 

... by calling in to an automated system, says Marcello Typrin, director of 
product marketing at speech software vendor Nuance Communications . 

Providing a speech -based interface to applications is a good thing for 
companies to outsource to a carrier, says Mark Plakias, an... 

while leading existing vendors to offer alternatives to their 
proprietary software platforms using VXML interpreter software 
Meanwhile, the Speech Application Language Tags (SALT) standard, backed 
by Cisco, Intel and Microsoft, also is coming on the... 

... VoiceGenie's product is an example of how the specification can work. 
The company makes middleware that runs on Linux. That middleware is the 
interface between . speech recognition systems that process what a 

caller says and VXML applications that answer or carry out... 

. . . caller requests, says Eric Jackson, vice president of strategy and 
business development at VoiceGenie. Traditionally, interfaces between 
speech recognition systems and back-end applications have come in the 

form of proprietary software that speech recognition platform vendors 
have written for their own systems, according to Zelos Group's Plakias. The 
advent of VXML makes voice -enabled applications less dependent on the 
platforms on which they run. As soon as each platform maker... 

. . . for Verizon call centers that also are being offered to other carriers 
and corporations. "With Voice XML, the application you build is really 
yours, regardless of what systems you want to deploy," says Marie Meteer, 
director of call center solutions at BBN. Once an application for a 
voice -activated service has been written, it doesn't have to be rebuilt 
from scratch if... 
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TEXT: 

New middleware based on Extensible Mark-up Language, or XML, and tailored 
for the health care market. . . 

...Software, Columbia, Md., in conjunction with Microsoft Corp., Redmond, 
Wash., has introduced Interchange 98. The middleware works with Health 
Level 7 standards to simplify the flow of clinical data among disparate 
information systems. Sequoia's Interchange 98 uses XML, a subset of the 
Internet's standard generalized markup language. It also uses Kona, a 
standard in development designed to help health care organizations organize 
specialty-specific clinical data for XML. Sequoia is selling the 
middleware packaged with Microsoft's Back Office suite of products. At 
HIMSS, the company demonstrated Interchange 98 by linking systems from nine 
vendors. Three companies, Lernout & Hauspie, which sells speech 
recognition software ; Datx Engstrom, which sells anesthesia monitors; 
and Infosys, which sells practice management software, already have... 
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. . . pitch, speed, and volume to customize a particular voice to 

personal preferences. To accomplish Automatic Voice Recognition (ASR) , 
VoiceCentral offers advanced Speaker- Independent (SI) technology that 
enables users to begin talking to their PDA with no voice training. . . 

...native Pocket PC applications — a key advantage that makes VoiceCentral 
more user-friendly than older voice -activated Pocket PC applications 
which require users to enter a middleware interface or learn new 
software . 

Voice Activation 
Once activated by the push of a button, VoiceCentral is controlled 
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completely by voice... 
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TEXT: 

...their first taste of CTI technology. "We've seen the first indication 
that CTI in general is making a move from being basically in the domain 
of the call center to moving to the desk of the general knowledge 
worker, " said Ken Landoline, analyst for Norwell, Mass. -based Giga 
Information Group. Landoline said... 

...of it." Another segment of the CTI industry that continued to grow was 
telephony-based voice recognition applications . "With companies like 
Nuance and ALTech (now SpeechWorks ) , there's very fast growth there," Hills 

...and ACD equivalents that are using the PC platform." The development of 
the call center/ middleware marketplace was a big step for the market in 
1998, Landoline contended. "With companies like GeoTel and Genesys, the 
call center/ middleware market has been tremendous," Landoline said. 
"They're doing what should have been done years... 
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standards for health care identifiers and transaction processing 

codes . 

Sequoia Introduces XML-based Middleware 

New middleware based on Extensible Mark-up Language, or XML, and 
tailored for the health care market... 

...Software, Columbia, Md., in conjunction with Microsoft Corp., Redmond, 
Wash., has introduced Interchange 98. The middleware works with Health 
Level 7 standards to simplify the flow of clinical data among disparate 
information systems. Sequoia's Interchange 98 uses XML, a subset of the 
Internet's standard generalized markup language. It also uses Kona, a 
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standard in development designed to help health care organizations organize 
specialty-specific clinical data for XML. Sequoia is selling the 
middleware packaged with Microsoft's Back Office suite of products. At 
HIMSS, the company demonstrated Interchange 98 by linking systems from nine 
vendors. Three companies, Lernout & Hauspie, which sells speech 
recognition software ; Datx Engstrom, which sells anesthesia monitors; 
and Infosys, which sells practice management software, already have... 
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VoiceGenie and Sun form strategic alliance for deployment of next 
generation open architecture speech solutions 
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... new and existing applications even as the environment evolves. The 
VoiceGenie platform has received numerous independent product awards. 
Officially founded in January 2000, VoiceGenie 1 s early work on its core 
product . . . 

i 
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... cablesedge.com . Century 21 Landstars Realty Inc., based in 
Richmond Hill, Ontario, has over 80 independent associate brokers and 
sales representatives. For more information, please contact (905) 707-1188 
or (416... 

... Technologies, a developer of thin client, web-based software for wired 
and wireless markets; CablesEdge Software , a developer of voice 
-interactive, software solutions for connecting broadband to wireless 
technology for home and business; eSalveo Corp., a developer... 
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. . . past, few development platforms focused specifically on the 
telematics industry, and the ones that did generally lacked the 
horsepower to support multiple applications such as voice recognition and 
Java . As a . . . 

... Neutrino RTOS, the Biscayne platform allows OEMs to build and test a 
rich array of middleware ' applications and respond to market demand for 
fully integrated telematics solutions . 
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. . . applications and is a key player in the growing SALT ecosystem, " 
said X.D. Huang, general manager of the .NET Speech Technologies group 
at Microsoft. "By responding quickly to the growing demand for SALT 
applications and platforms, HeyAnita is empowering customers to deploy 
speech applications that fully leverage their Web investments." 

"HeyAnita* s support of SALT is totally aligned with... 

... We deliver solutions that give companies choice at every level; 
including programming language of choice, speech recognition engine 
of choice and telephony hardware of choice." Providing voice access to a 
wide variety of . . . 

... Contributor in January 2002. The SALT Forum aims to create an open, 
royalty-free, . platform- independent standard 1 ' for speech enabling 
multimodal and telephony applications. SALT will make possible multimodal 
and telephony. . . 

... PCs, wireless personal digital assistants (PDAs) and telephones. About 
HeyAnita Inc. HeyAnita is a leading voice software company focused on 
providing an interactive voice and multimodal interface to any device. 
HeyAnita 1 s . . . 

... includes the FreeSpeech (TM) Platform, FreeSpeech (TM) Browsers (SALT and 
VoiceXML) as well as prepackaged voice applications , developer tools 
and professional services. Companies worldwide using HeyAnita 1 s voice 
solutions include Sprint PCS... 

...HeyAnita Korea). HeyAnita Inc. is a privately held company headquartered 
in Los Angeles, CA. Sample voice applications can be heard by calling 
HeyAnita's technology showcase: 1-800-44-ANITA. For more... 

... browser components, and framework components that allow ASP.NET Web 
developers to extend their Web applications with speech recognition 
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for multimodal (audio in, visual out) and voice-only interaction. The beta 
version of the Microsoft . NET Speech SDK allows developers to build 
speech -enabled Web applications and interact with them using Microsoft 
Internet Explorer client components that ship with the SDK. . . 
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. . . has made the L&H Speech Recognition ASDK (Application Software 
Development Kit) for PlayStation 2 generally available. The ASDK allows 
developers to easily integrate speech recognition features into games and 
edutainment . . . 

. . . strategy games increases player interactivity and enhances the overall 
gaming experience. L&H ' s ASDK middleware will be showcased for the first 
time from the 2nd to 3rd of August in... 
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TEXT: 

. . .dynamic scalability and fault- 
tolerance of QNX Neutrino allows Biscayne to support a variety of 
middleware 

applications, thus speeding overall development cycles and time to market 
for 

manufacturers. 

In the past, few development platforms focused specifically on the 
telematics industry, and the ones that did generally lacked the 
horsepower to 

support multiple applications such as voice recognition and Java. As 
a result, 

OEMs struggled to build the sophisticated and feature-rich solutions... 
. . .Neutrino RTOS, the Biscayne 

platform allows OEMs to build and test a rich array of middleware 
applications 



and respond to market demand for fully integrated telematics solutions. 
"The Biscayne platform lets... 

. . .Unit at Hitachi 

Semiconductor (America) Inc. "Since Biscayne is high-powered and can 
support 

multiple middleware technologies, the need for a dependable OS is 
paramount. 

QNX's track record for making... 
...goal of COMET is to provide 

vertically integrated telematics solutions to the automotive industry, 
galvanize middleware pieces from third-party vendors into one cohesive 
solution, formalize partnerships within the telematics ecosystem. . . 
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...Industry Support 

The speech system in Windows CE for Automotive 3.5 supports 
phonetic-based 

speech recognition and human-sounding text-to-speech technologies 
through SAPI 

5.0. The system also offers... 
. . . handling 

incoming requests and interactions, enabling the speech system to work 
easily 

and seamlessly with applications from speech engine vendors and speech 
application developers. 

Microsoft has joined with many speech vendors to build speech 
recognition 

and text-to- speech engines that are compatible with Windows CE for 
Automotive 

3.5 via the SAPI 5.0. . . 

...ScanSoft recently developed a SAPI 5 . 0-compliant 

version of the RealSpeak Compact text-to- speech engine . 11 

"The speech system in Windows CE for Automotive 3.5 provides 
developers 

with the tools to easily. . . 
...Internet and wireless-enabled 

solutions driven by speech," said D. Lynn Shepherd, vice president and 
general 

manager for Mobile and Wireless at Fonix. "These solutions can range from 
Internet data access... 



. . . s a 

key success factor." 

"Windows CE for Automotive is a powerful platform for emerging 
speech -enabled automotive applications , " said Nobuaki Tanaka, senior 
managing 

director of Asahi Ka'sei. "With SAPI 5.0 compliance, our VORERO 

speech - recognition engine can be easily integrated in a wide range of 
automotive devices by developers worldwide." 
"Windows . . . 

...major car manufacturers and tier-1 suppliers in Japan," said Tomoaki 
Nakamura, 

manager of the Middleware System Design Department, Semiconductor and 
Integrated Circuits, at Hitachi. "Microsoft's SAPI 5.0 is... 
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TEXT : 

. . . to-market and supports virtually all ASR and TTS engines on the 
market today. This middleware enables a developer to write to these 
engines 

through SAPI, even though the TTS engine... 
. . .month) and the FAAST 

Framework provide that level of flexibility." 

Mark Hamilton, Vice President and General Manager of the CT and 
Server 

Group at Fonix Corporation, states, "To ensure that the... 
. . . Framework 

addresses the needs of IVR, contact center, and CRM solution providers who 
want to voice -enable an application , we've worked closely, over the 
last few 

years, with companies across various industries in... 
...now. Fonix is committed to providing the tools 

that customers need to effectively integrate diverse speech engines and 
to 

promote the use of natural-user interfaces." 

Uses of FAAST 

For some applications. . . 



21/3, K/13 (Item 1 from file: 636) 

DIALOG (R) File 636: Gale Group Newsletter DB(TM) 
(c) 2004 The Gale Group. All rts. reserv. 



05045165 Supplier Number: 76934230 (USE FORMAT 7 FOR FULLTEXT) 

Lernout & Hauspie speech products announces L&H Speech Eecognition ASDK for 
PlayStation 2 ; Flexible middleware enables easy integration of speech 
recognition features into games and edutainment titles; Speech 
recognition increases player interactivity and enhances overall gaming 
experience . 

M2 Presswire, pNA 

August 2, 2001 

Language: English Record Type: Fulltext 
Document Type: Newswire; Trade 
Word Count: 811 

today announced that its Speech and Language Technologies Division 
(SLT) has made the L&H Speech Recognition ASDK ( Application 
Software Development Kit) for PlayStation 2 generally available. The 
ASDK allows developers to easily integrate speech recognition features 
into games and edutainment titles . Employing speech recognition in 
adventure, role-playing or strategy games increases player interactivity 
and enhances the overall gaming experience. L&H's ASDK middleware will be 
showcased for the first time from 
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...are among the companies blazing the XML trail. Explaining XML As a 
subset of standard generalized markup language, a widely used Internet 
programming code, XML has its origin in hypertext markup... 

...toward XML. In partnership with Microsoft, Sequoia Software Corp., 
Columbia, Md., recently introduced XML-based middleware geared to health 
care technology vendors and providers. By using XML and the Internet, 
Sequoia . . . 

...systems, Mason adds. Pioneering Vendors Several vendors already have 
licensed Sequoia's new XML-based middleware , described as a 11 transaction 
server," and will embed it into their own software. These include... 

...of health pare information networking software; InfoSys, a Schaumburg, 
111. -based vendor of practice management software ; and Lernout & Hauspie 
Speech Products, a Burlington, Mass. -based vendor of voice recognition 
technology. The standard licensing cost for Sequoia Interchange98 is 15% of 
the list price of... 
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TEXT: 

...CTI past and predict CTI future. 

"We've seen the first indication that CTI in general is making a 
move from being basically in the domain of the call center to moving 
to the desk of the general knowledge worker," says Ken Landoline, 
analyst for No.rwell, Mass. -based Giga Information Group. 
Landoline says ... 

...of it." 

Another segment of the CTI industry that continued to grow was 
telephony-based voice recognition applications . 
"With companies like Nuance and ALTech (now SpeechWorks ) , 
there's very fast growth there," Hills... and ACD equivalents that 
are using the PC platform. " 

The development of the call center/ middleware marketplace was a 

big step for the market in 1998, Landoline says. 

"With companies like GeoTel and Genesys, the call 

center/ middleware market has been tremendous," Landoline says. 

"They're doing what should have been done years ... over the next 10 years, 

according to research from 

Burlington, Mass. -based Ovum Inc., an independent technology analyst 
group. Ovum's report, "Unified Messaging Services: Market 
Strategies," points to the Internet... 
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Toward the mobile enterprise, step by step 

With improved devices, apps and infrastructure, wireless is making headway 
as a solid new data center technology. Analyst Mark Lowenstein offers 
a 10-point evaluation guide. 
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Text: 

... major wireless operators. These factors made for complex wireless 
projects involving a veritable circus of middleware , gateway and system 
.integrator vendors. As a result, outside of BlackBerry, which counts about 
1. . . 

... several fronts: * Wireless networks have improved. Coverage is better, 



and 2.5G networks such as General Packet Radio Service/Enhanced Data 
Rates for GSM Evolution and Code Division Multiple Access lx. . . 

J. . applications with high input requirements, provide devices with qwerty 
keyboards or perhaps some level of voice recognition . For heavy use 
while driving, consider a good car kit to improve reception and provide. . . 

... up speed. This is where developments in location services, presence and 
multi-modal (such as voice recognition and text- to- speech ) 
applications might help optimize the mobile experience. Lowenstein is 
managing director of Mobile Ecosystem, a leading. . . 
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Desktop computer- telephone integration; Fact or fiction? 

Protocols abound, but rationalizing them against a welter of proprietary 
systems is tough. 
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Text: 

...the Multi-vendor Integration Protocol (MVIP) - have enabled development 
of software-based fax and interactive voice -response applications 
without the need to buy the proprietary boxes that are characteristic of 
voice peripherals. These PC bus protocols let add-in boards - such as those 
used for fax, speech recognition or text-to- speech conversion - 

interoperate . But users who install SCSA- or MVIP- compliant voice 

processing applications in hardware- independent telephony servers on 
LANs have only a piece of the puzzle. They still need call... 

. . . Windows 95, basic computer telephony capability could soon be in the 
hands of millions. But middleware is needed to provide a sophisticated 
front end for the end user. To make user. . . 
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Abstract (Article Summary) 

Under terms of the agreement, (I lucent will integrate NLSA with its CentreVu Response Solutions, a suite of 
offerings that provides customer care applications based on Integrated Voice Response (IVR) technology. 
CentreVu Response Solutions is based on Lucent's INTUITY CONVERSANT platform. 

The integration of the two companies' technologies dramatically simplifies the development of natural language 
understanding speech telephony applications, making speech systems accessible to a broader market. As a result 
of the agreement, (^ Unisys pioneering NLSA technology now will be available to Lucent IVR developers worldwide, 
enabling them to build complex speech-based applications for CentreVu Response Solutions more quickly and 
easily. 

Unisys NLSA is also the only natural language technology that is speech-recognizer independent, offering 
QJ Lucent developers a common interface for developing spoken language applications without being tied to the 
speech recognition engine deployed. This allows developers to "snap" speech recognition engines in and out of 
their programs based on business requirements or technology improvements. 
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BLUE BELL, Pa.--(BUSINESS WIRE)-- March 8, 1 999- 
Integration of Tools from Speech Technology Leaders Ease and 
Accelerate Development of Mass-Market Speech Applications 

Q Unisys Corporation and @ Lucent Technologies, Inc. today announced an agreement allowing ©Lucent to license 
© Unisys award-winning Natural Language Speech Assistant (NLSA) tool suite. 

Under terms of the agreement, © Lucent will integrate NLSA with its CentreVu Response Solutions, a suite of 
offerings that provides customer care applications based on Integrated Voice Response (IVR) technology. 
CentreVu Response Solutions is based on Lucent's INTUITY CONVERSANT platform. 

The integration of the two companies' technologies dramatically simplifies the development of natural language 
understanding speech telephony applications, making speech systems accessible to a broader market. As a result 
of the agreement, ©Unisys pioneering NLSA technology now will be available to Lucent IVR developers worldwide, 
enabling them to build complex speech-based applications for CentreVu Response Solutions more quickly and 
easily. 

Natural language speech systems often replace touch-tone menu systems for applications such as phone banking. 
The systems select key words and phrases that callers speak in a natural voice, such as "I want to open a new 
account." This capability simplifies the caller's experience and dramatically expands the potential for new types of 
self-service applications, providing businesses with substantial cost savings and opportunities for new revenue. 

Unisys NLSA simplifies the development and deployment of natural language applications with easy-to-use tools 
that streamline voice- user interface design, testing, grammar creation and word meaning analysis. Average 
development and testing time is reduced from months to days. 

Unisys NLSA is also the only natural language technology that is speech-recognizer independent, offering 
©Lucent developers a common interface for developing spoken language applications without being tied to the 
speech recognition engine deployed. This allows developers to "snap" speech recognition engines in and out of 
their programs based on business requirements or technology improvements. 

"We want speech application development to be as easy and open as possible," said Denis Aull, director of 
Response Offers, © Lucent Technologies . "Integrating the Unisys NLSA and INTUITY CONVERSANT technology 
delivers the strongest, yet simplest, solution we've seen for developing and deploying speech-based applications. It 
opens our platform to many speech recognition engines." 

© Unisys and © Lucent : Joining to Simplify Speech Application Development 

The agreement marks the second integrated offering under an alliance formed by the two companies in November, 
1998, when © Unisys and Lucent Speech Solutions agreed to integrate NLSA with Lucent's state-of-the-art text-to- 
speech and automatic speech recognizer engines. 

This integrated software package, which will be available in the second quarter of 1999, provides a seamless 
connection between the speech recognition tools and engines, allowing developers to dramatically reduce 
development time and enable faster deployment of speech-based applications. 

'Tools like NLSA that make it faster and easier to develop and deliver speech applications without requiring 
complex programming skills help bring more natural speech applications to the market," said Joe Yaworski, vice 
president and general manager, Unisys Natural Language Understanding business initiative. "By working with 
© Lucent , we are giving developers around the world who are familiar with the Lucent CONVERSANT environment 
access to these benefits." 

As part of the new agreement, NLSA also will be integrated with © Lucent 's Voice@Work, which enables 
developers to create custom applications for the INTUITY CONVERSANT platform. Because the Unisys Natural 
Language Speech Interpreter will be incorporated into CentreVu 
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Response Solutions, developers can dramatically cut the time to deploy speech applications. The environment will 
include support for multiple languages, formats for speaking dates, numbers and currencies in any language 
supported by the speech recognizer engine. A true innovation for developers, the combination of Voice@Work with 
NLSA opens the platform to a wide range of speech engines in a single development environment. 

Unisys NLSA will be available as part of © Lucent 's CentreVu Response Solutions in May, 1999. 



About © Lucent 's CentreVu Response Solutions 



Lucent's CentreVu Response Solutions, based on the INTUITY CONVERSANT platform, uses advanced voice 
response technology to collect and provide a wide array of information to callers through voice and fax. 

CentreVu Response Solutions can respond to inquiries and handle entire transactions 24 hours a day. With 
advanced software, INTUITY CONVERSANT can transfer callers to the correct line after callers speak the desired 
name, and can support natural language speech recognition. 

About the NL Speech Assistant 



The Natural Language Speech Assistant is an advanced speech application development tool set. Unisys NLSA 
provides application developers not only with the tools for designing and creating speech applications, but also 
provides for application project management, development methodology and testing. 

Unlike all other natural language technology, Unisys NLSA keeps developers' applications from becoming obsolete 
by providing open tools to design and develop across multiple platforms and speech recognizers. ^ Unisys has a 
complete reseller program for NLSA, including platform integration, marketing and sales support, technical support 
and training. Visit http://www.marketplace.unisys.com/nlu for more information. 



About the Companies 



(S lucent Technologies , headquartered in Murray Hill, New Jersey, designs, builds and delivers a wide range of 
public and private networks, communications systems and software, data networking systems, business telephone 
systems and microelectronic components. Bell Labs is the research and development arm for the company. For 
more information on (O Lucent Technologies , visit our Web site at www.lucent.com. 



(D Unisys (NYSE:UIS) is more than 33,000 employees helping customers in 100 countries apply information 
technology to solve their business problems. @Unisys solutions are based on a broad portfolio of global information 
services including systems integration, outsourcing, "repeatable 11 application solutions, consulting, network 
integration, remote network management, and multivendor maintenance and support, coupled with enterprise-class 
servers and associated middleware, software and storage. 



Repeatable solutions are focused on key vertical markets including financial services, transportation, 
telecommunications, government, publishing and other commercial markets. Headquartered in Blue Bell, 
Pennsylvania, in the Greater Philadelphia area, ^ Unisys had 1998 annual revenue of $7.2 billion. Access the 
(i) Unisys home page on the World Wide Web - http://www.unisys.com - for further information. 

(OUnisys is a registered trademark and NL Speech Assistant and NL Enabled are trademarks of CO Unisys 
Corporation . CentreVu and CONVERSANT are registered trademarks and INTUITY is a trademark of (O Lucent 
Technologies . All other brands and products referenced herein are acknowledged to be trademarks or registered 
trademarks of their respective holders. 
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Abstract (Article Summary) 

SPEECHTEK, NEW YORK/ Oct. 27 /PRNewswire/ -- © Unisys (NYSE: UIS) and Philips Speech Processing today 
announced an agreement allowing ® Unisys to resell and integrate Philips automated speech recognizer (ASR) 
engine with ^ Unisys award-winning Natural Language (NL) Understanding technology, opening new markets for 
both companies. As part of this agreement, (^ Unisys will integrate the Philips ASR, SpeechPearl, into its NL 
Speech Assistant Toolkit. This integration will provide a turnkey industry-leading development environment from a 
single source to build and operate powerful speech-based applications. 



With this agreement Philips will be the inaugural member of the new Unisys ASR Reseller Program (see related 
release). By pre-integrating and bundling speech recognizers such as Philips SpeechPearl with the Unisys Natural 
Language technology, the Unisys ASR Reseller program will drive the growth of the speech-based application 
market to one day replace touch-tone applications with more efficient voice technology. 



"Unisys Natural Language tools are licensed by many of the leading IVR vendors offering large-vocabulary speech 
recognition," said Joe Yaworski, vice president and general manager of the Natural Language Understanding 
program, (i) Unisys Corporation . "Our reseller agreement with Philips will provide our IVR partners with access to 
industry-leading speech recognition technology, supported by (g Unisvs expertise in handling the complexities of 
integrating hardware, middleware, speech recognition and natural language technologies." 



Full Text (920 words) 

Copyright PR Newswire - NY Oct 27, 1998 

Industry: COMPUTER/ELECTRONICS 

SPEECHTEK, NEW YORK, Oct. 27 /PRNewswire/ - © Unisys (NYSE: UIS) and Philips Speech Processing today 
announced an agreement allowing © Unisys to resell and integrate Philips automated speech recognizer (ASR) 
engine with (^ Unisys award-winning Natural Language (NL) Understanding technology, opening new markets for 
both companies. As part of this agreement, ® Unisys will integrate the Philips ASR, SpeechPearl, into its NL 
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Speech Assistant Toolkit. This integration will provide a turnkey industry-leading development environment from a 
single source to build and operate powerful speech-based applications. 

With this agreement Philips will be the inaugural member of the new Unisys ASR Reseller Program (see related 
release). By pre-integrating and bundling speech recognizers such as Philips SpeechPearl with the Unisys Natural 
Language technology, the Unisys ASR Reseller program will drive the growth of the speech-based application 
market to one day replace touch-tone applications with more efficient voice technology. 

Previously, developers were forced to turn to a variety of vendors for the pieces of technology required to build 
speech-based applications and then integrate those technologies. Since the ©Unisys and Philips technology 
combination will be sold predominantly through a well-developed distribution network of Interactive Voice Response 
(IVR) channel partners, developers will receive all the necessary technology components from a single source, with 
the, most complex technology already integrated. 

"This agreement reinforces Philips strategy of partnering with leading- edge speech solution providers to become 
the dominant supplier of natural speech technologies," explained Paul Celen, COO of Philips Speech Processing. 
"This partnership enables both ^ Unisys and Philips to develop and drive the latest speech recognition 
'technologies in the telecommunications marketplace." 

"Unisys Natural Language tools are licensed by many of the leading IVR vendors offering large-vocabulary speech 
recognition," said Joe Yaworski, vice president and general manager of the Natural Language Understanding 
program, @ Unisvs Corporation . "Our reseller agreement with Philips will provide our IVR partners with access to 
industry-leading speech recognition technology, supported by fe) Unisys expertise in handling the complexities of 
integrating hardware, middleware, speech recognition and natural language technologies." 

About (^ Unisys NL Speech Assistant 

The Unisys Natural Language (NL) Speech Assistant is an advanced speech application development tool set that 
is platform- and speech recognizer- independent. As part of the Unisys Natural Language Understanding suite of 
products, NL Speech Assistant provides application developers not only with the tools for speech application 
creation but also for application project management, development methodology, and testing. Through NL Speech 
Assistant, developers have a standard tool to create spoken language applications across platforms and speech 
recognizers, protecting their applications from obsolescence. 

About Philips SpeechPearl 

SpeechPearl is recognized as the leading speech recognition engine for both small and large vocabulary 
applications. Developers now have access to more than 23 languages developed by Philips Speech Processing 
and the commitment of a true global organization. More importantly, SpeechPearl is one of the only recognizers 
that can switch languages "on-the-fiy". This allows development of robust multi-lingual applications and expands 
the developer's market potential in the U.S. and internationally. 

Availability 

The combined offering will be available early 1st Quarter 1999. ^ Unisys has integrated the Philips SpeechPearl 
recognizer initially into several of its IVR channel partners' products and toolkits. Additional development will be 
done on an ongoing basis. 

The agreement covers all languages currently offered or in development by Philips. It has no restrictions and will 
include future language development. Core languages offered include: US English, US Spanish, UK English, 
Spanish, German, French, Dutch and Italian. Additional languages covered by the agreement include: Australian 
English, Canadian French, Canadian English, Danish, Swedish, Swiss German, Austrian German, Portuguese, 
Brazilian Portuguese, Greek, Japanese, Mandarin, Cantonese, Norwegian and Thai. 

About Philips 

Philips Speech Processing, an Atlanta-based business unit of Royal Philips Electronics (NYSE: PHG), consists of 
two lines of business: Speech Technology and Dictation. Philips has more than 40 years experience in the 
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development and marketing of speech products and is the largest producer of professional dictation systems 
worldwide. The company also developed the world's first natural continuous speech recognition technology, and is 
one of the market leaders for products and technology in this fast growing market. 



Royal Philips Electronics of the Netherlands is one of the world's largest electronics companies, with sales of more 
than $39 billion in 1 997. The company is a global leader in the production of television sets, lighting, home 
telephony products and electric shavers. Its 264,700 employees in more than 60 countries are active in the areas of 
semiconductors and components, consumer products, professional products and systems, lighting, and software 
and services. Philips is quoted on the NYSE (PHG), London, Frankfurt, Amsterdam and other stock exchanges. 
News from Philips Speech Processing is located at - http://www.speech.philips.com. 



About Q) Unisys 



Unisys (NYSE: UIS) is more than 33,000 employees helping customers in 100 countries apply information 
technology to solve their business problems. (j)Unisys solutions are based on a broad portfolio of global information 
services including systems integration, outsourcing, "repeatable" application solutions, consulting, network 
integration, remote network management, and multivendor maintenance and support, coupled with enterprise-class 
servers and associated middleware, software and storage. Repeatable solutions are focused on key vertical 
markets including financial services, transportation, telecommunications, government, publishing and other 
commercial markets. Headquartered in Blue Bell, Pennsylvania, in the Greater Philadelphia area, (^ Unisys 1997 
annual revenue was $6.6 billion. Access the © Unisys home page on the World Wide Web - http://www.unisys.com 
- for further information. 

(P Unisys is a registered trademark of £j )Unisvs Corporation . 



SpeechPearl is a trademark of Philips Speech Processing. All other brands and products referenced herein are 
acknowledged to be trademarks or registered trademarks of their respective holders. SOURCE (^ Unisys 
Cor poration 
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Abstract (Article Summary) 

M2 PRESSWIRE-31 August 1999-UNISYS: Unisys leverages natura language leadership in customer interaction 
solutions for call centers (C)1 994-99 M2 COMMUNICATIONS LTD BLUE BELL, PA -- better support clients' call 
center implementation needs, Unisys has created a Natural Language Understanding (NLU) Services organization 
that will focus specifically on consulting and application development for customer interaction. The service group 
will leverage Unisys leadership position in speech recognition and natural language understanding technology to 
enable clients t successfully navigate the myriad of business and technical issues associated with applying natural 
language solutions across a range of customer contact channels, including telephone, email and Web chat. 



"With more customers demanding service across a variety of channels and expectations continuing to rise, 
companies face ever-increasing pressure to leverage customer contact technologies to handle the volume and 
complexity." Companies that utili natural language applications allow their customers to interact with them in a 
natural, comfortable manner rather than through a predefined interactive voice response menu maze or a 
multilayered Web site. Using natural language, customers receive automated responses to questions and 
information about products and services more quickly and efficiently. 



Full Text (652 words) 

Copyright M2 Communications Ltd. Aug 31, 1999 

M2 PRESSWIRE-31 August 1999-UNISYS: Unisys leverages natura language leadership in customer interaction 
solutions for call centers (C)1 994-99 M2 COMMUNICATIONS LTD BLUE BELL, PA -- better support clients' call 
center implementation needs, Unisys has created a Natural Language Understanding (NLU) Services organization 
that will focus specifically on consulting and application development for customer interaction. The service group 
will leverage Unisys leadership position in speech recognition and natural language understanding technology to 
enable clients t successfully navigate the myriad of business and technical issues associated with applying natural 
language solutions across a range of customer contact channels, including telephone, email and Web chat. 



"Customer care is quickly becoming the critical differentiate in the battle for securing profitable and meaningful 
relationships with customers," said Tom Steinmetz, Director of Unisys North America Customer Interaction 
Solutions Center of Excellence. 
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*s "V$Xh more customers demanding service across a variety of channels and expectations continuing to rise, 
companies face ever-increasing pressure to leverage customer contact technologies to handle the volume and 
complexity." Companies that utili natural language applications allow their customers to interact with them in a 
natural, comfortable manner rather than through a predefined interactive voice response menu maze or a 
multilayered Web site. Using natural language, customers receive automated responses to questions and 
information about products and services more quickly and efficiently. 

Companies offering speech and natural language custome services solutions will benefit from efficiencies that can 
be realized through enhanced technology while maintaining a high level of personal service. In addition, with 
customers able to access information more conveniently using natural language tools, fewer direct customer 
interactions will be required in the company's call center. This reduction in call volume results in more time for 
customer service representatives to deal effectively with more complex - and possibly more opportunistic - 
customer inquiries. 

Unisys NLU Services is part of the North American Custome Interaction Solutions practice, which combines the 
company's expertise in enterprise systems design, legacy systems integration and solutions development. The 
group offers a full range o services including marketplace assessment, business process planning, proof-of-concept 
services and speech application development. 

Unisys assists clients in selecting solutions that match th business and technological needs specific to each 
interaction channel. In many cases, this could involve implementing two or three different natural language 
understanding solutions to cover thos industries where clients will make contact over multiple media, such as 
telephone, Web-chat, and email. 

A key component of NLU Services is the Unisys Natural Languag Speech Assistant (NLSA) software, a 
groundbreaking development toolkit that has eliminated much of the time, risk and expense once associated with 
deploying speech-based applications. 

"The award-winning Unisys NLSA Toolkit enables developers t design and create speech applications, as well as 
manage the application development process, independent of the voice recognition, interactive voice 
recognition or natural language platform on which it will be deployed," Steinmetz said. "Our leadership in this 
technology provides us with a deep understanding of how to deploy such solutions quickly and effectively." Abo 
Unisys Unisys is 34,000 employees helping customers in 10 countries apply information technology to solve their 
business problems. Unisys solutions are based on a broad portfolio of global information services including 
electronic business, systems integration including custom and "repeatable" application solutions, outsourcing, 
Microsoft Windows NT services, network services, and multivendor maintenance and support, coupled with 
enterprise-class servers and associated middleware, software and storage. Repeatable solutions are focused on 
key vertical markets including financial services, transportation, telecommunications, government, publishing, and 
other commercial markets. 

The company is headquartered in Blue Bell, Pennsylvania, in the Greater Philadelphia area. For more information 
on the company, access the Unisys home page on the World Wide Web at www.unisys.com. 

Investor information can be found at www.unisys.com/investor. 



CONTACT: Lucia Romano, Unisys Tel: +1 215 986 4698 e-mai lucia.romano@unisys.com Sheri Loose, BSMG 
Worldwide for Unisys Tel +1 972 830 2610 e-mail: sloose@bsmg.com *M2 COMMUNICATIO DISCLAIMS ALL 
LIABILITY FOR INFORMATION PROVIDED WITHIN M2 PRESSWIRE. DATA SUPPLIED BY NAMED 
PARTY/PARTIES.* 



' A Back to Top 



« Back to Results 



< Previous Article 3 of 12 Next > 



Publisher Information 



Print] pEnwii 



□ Mark Article 




Copyright ©2004 ProQuest Information and Learning Company. All rights reserved. Terms and Conditions 

Text-only interface 



http://proquest.umi.com/pqdweb?index=2&did=000000045070952&SrchMode=l&sid=2&F. 



9/20/04 



Article View Page 3 of 3 

*'* * From: Pro £vuesl" 



http://proquest.umi . com/pqdweb?index=2&did=000000045070952& SrchMode= l&sid=2&F . . . 9/20/04 



>ogie searcn: microsott middleware speech "independent laver" 

3 Page 1 of 2 

Web Images Groups News Froogle more » 

Imicrosoft middleware speech "independent la\i HS^SMgl Advanced Searoh 
' — 3 mtmSmiilM\ Preferences 

Results 1 - 10 of about 11 for microsoft middleware speech "independent layer". ( 0 .42 seconds) 




Web 




[pdf] ATLAS: A generic software platform fo r speech technology based 
File Format: PDF/Adobe Acrobat ' ' ajr oocu .. 

of the various layers of the middleware as implemented ... four examples of how a speech 
^recognition eng.ne ... be an industry standard API, such as Microsoft's SAPI P 
-www^speech.kth.se/qpsr/tmh /2001/01-42-029- 042.pdf - Similar pages 

[psj TMH-QPSR 1/2001 
File Format: Adobe PostScript - View as Text 

contents of thevarious layers of the middleware as. ... several otherCommunicator enaines 
(text-to-speech engine, parser ... 9 http://www.microsoft.com/speech/ 10 htto ° 
www.speech.kth.se/qpsr/tmh/2001/01-42-029-042.ps - Similar pages 
[ More results from www.speech.kth.se 1 — 

Neohapsis Archives - Freshmeat News - #0026 - ffm-news] Newsletter 

Apo"^ m PRL js a middleware application betwe en a ... http://freshmeat.net/proiects/s Deech 
dispatcher ... as improved Windows support (via Microsoft Research's D Xrs P 
arch.ves.neohapsis.com/archives/ apps/freshmeat/2003-12/0026.html - 82k - Cached - Similar pages 

[pdf] Introduction 

File Format: PDF/Adobe Acrobat - View as HTML 

hO'" mmivv 06 v '^^f^ Pictures and video are a " handled ^ a similar way 

by ... UNIX Yes Yes DOS No Yes Microsoft Windows No Yes Windows/NT No Yes Client 

www.fmi.uni-passau.de/lehrstuehle/ hahn/lehre/uvk_ss_03/ENetw-lntro-SS03.pdf - Similar pages 

[ pDF ] An Application-Independent Spea ker Adaptation Service 

File Format: PDF/Adobe Acrobat - View as HTML 

... between Cisco, Comverse, Intel, Microsoft, Philips and ... The Atlas multi-layered 
middleware, wh.ch allows ... application, dialog engine ASR speech defector ^source 
www.e.kth.se/~e97_hes/thesis/report.pdf - Similar pages 

[fm-news] Newsletter for Friday, December 26th 2003 

'"o/T£ L * am j dd,eware application between a ... freshmeat.net/projects/speech-dispatcher/ - % 
-% 064 ... Windows support (via Microsoft Research's Detours aispatcner/ / 0 

info.ccone.at/NEWS/msg01 860.html - 73k - Cached - Similar pages 
Technical Background, State of the Art 

... to date, only me research part of Microsoft have released a ... co-existence/interworkino 
of middleware and legacy systems; ... for a wide range of speech codecs fof 9 
www.cordis.lu/infowin/acts/ analysys/products/thematic/ngi/ch2/ch2.ht m - 66k - Cached - Similar pages 
Digital Library Bibliography 

... styli, handwriting and voice recognition, speech synthesis, tiny ... httpV/msdn microsoft 
com/workshop/server/asp ... Linux servers, as also, the middleware for access 
www-diglib.stanford.edu/~testbed/dlbibs/dlbib.html - 101k -Cached - Similar pages 

[pdf] C:\20th\!NEWPAGE.S10\!REV.- IN\!INTROSP REA fPaae 2) 
File Format: PDF/Adobe Acrobat - View as HTML " 4 — ~ 1 

P J?Z !' 19 iru" 1 l 97 - and beyond Nothing Stops It! Page 2. Of all the winnino 
attributes of the Open VMS operating system, perhaps . . 9 

h71 000.www7.hp.com/openvms/20th/vmsbook.pdf - Similar pages 

[pdf] Table of Content 

http://ww.google.^ 



9/20/04 



ATLAS: A generic software platform for speech 
technology based applications 

Hakan Melin 

Centre for Speech Technology (CTT), TMH t KTH, Drottning Kristinas vdg 31, SE-100 44 Stockholm 
Abstract 

ATLAS is a Java software library that provides a framework for building multi- 
lingual and multi-modal applications, especially dialogue systems, on top of speech 
technology components. ^The design is based on a layered system model, where 
ATLAS sits as a middleware between an application-dependent layer and the 
speech technology components and implements much of application-independent 
functionality in the system, ATLAS is itself layered with interfaces to speech 
technology components at the bottom and self-contained dialogue components at 
the top. The layered design is both efficient and flexible and is suitable for a 
research environment The framework also provides support for application- 
dependent layers through a structure of an application with sessions interacting 
with users through terminals. The terminal concept supports creating audio device- 
independent applications that run transparently in both telephone and desktop 
environments. Several speech technology components are available for use with 
the ATLAS framework, including text-to-speech, speech recognition and speaker 
verification systems. Four applications that use ATLAS have so far been developed 
within student and research projects at the Centre for Speech Technology (CTT), 
including a speech controlled telephone banking system (CTT-bank) and an 
automated entrance receptionist (PER). 



1. Introduction 

This paper presents an effort at the Centre for 
Speech Technology (CTT) at KTH to create a 
framework for multi-modal and multi-lingual 
speech technology applications. The framework 
is called ATLAS and is a Java software library 
that includes a set of application programming 
interfaces (APIs) for speech technology 
components. The aim has been to code much of 
application invariant, low-level functionality in 
ATLAS and to provide application programmers 
with a powerful, easy-to-use speech technology 
API. ATLAS thereby defines a multi-layered 
system architecture that encourages software 
reuse. The framework is intended for building 
demonstration systems in a research environ- 
ment. 

Human-machine interface design and 
usability issues are fundamental for the success 
of speech technology, and a demonstration 
system can be useful in studies on these topics. 
A demonstration system can also be useful in 
collecting speech data to support evaluation of 
for instance speech recognisers. Usability 
studies and speech technology evaluation were 
also the two main goals of the CTT-bank project 
(Ihse, 2000; Melin et al, 2001), one of the 
projects where ATLAS has been used. 



With the growing commercial interest in 
speech technology based applications, and an 
increasing demand on research labs to do 
industry relevant research, it is also becoming 
more and more valuable to show practical 
examples of research advances. This often 
means live demonstrations of the technology in 
useful applications. Demonstration systems 
typically include several speech technology 
components, such as speech generation, text-to- 
speech synthesis, speech recognition, speech 
understanding, speaker recognition, and 
dialogue management. The components require 
complex interaction with each other and with 
audio devices, and the components are 
themselves complex. As a result, a demon- 
stration system is often a complex system. To 
prevent system building itself to take too much 
effort away from the more research oriented 
tasks, such as improving basic speech 
technology components, it is important to have 
an efficient framework for building 
demonstration systems. A framework can be 
defined by for instance a suitable programming 
language, a good system architecture and 
reusable software components. It is important 
that such a framework is flexible enough to 
allow researchers to test new ideas, and that it 
evolves with state-of-the-art in speech 



technology. This requirement is challenging, 
because it somewhat opposes the requirement 
for efficiency. A framework that is efficient and 
easy to use when building small demonstration 
applications may not be flexible enough when 
building for example state-of-the-art conversant 
dialogue systems. 

Several publications have reported on efforts 
in creating frameworks for speech technology 
applications. A well-known platform is Galaxy- 
II (Seneff et al., 1998). It was developed at MIT 
and has been used successfully in several 
applications such as the' Jupiter, Voyager and 
Orion systems. It has also been designated as the 
first reference architecture for the DARPA 
Communicator Program 1 and is now maintained 
and enhanced by MITRE (Bayer et al., 2001). 
Galaxy-II is a client-server architecture where 
all interactions between servers are mediated by 
a programmable hub and managed by a hub 
script. 

Jaspis 2 (Turunen & Hakulinen, 2000) is an 
agent based architecture designed with special 
focus on multi-linguality and user and 
environment adaptivity. Sutton et al. (1998) 
describe the OGI CSLU Toolkit 3 that includes 
several ready to use speech technology 
components and a Rapid Application Developer 
tool. Potamianos et al. (1999) review efforts in 
defining design principles and creating tools for 
building dialogue systems, including architec- 
tural issues. 

Several commercial companies offer plat- 
forms for developing applications with speech 
technology. Nuance markets SpeechObjects 
(Nuance, 2000) as "a set of open, reusable 
components that encapsulate the best practices 
of voice interface design". SpeechObjects as a 
component technology has been standardised 
within the V-Commerce alliance 4 . It is free 
source and claimed to be portable between 
platforms, spoken languages and speech 
engines. Philips 5 markets SpeechPearl and 
SpeechMania as speech recognition and speech 
understanding-centric product families. Speech- 
Pearl includes SpeechBlocks, in concept very 
similar to Nuance' SpeechObjects. 

Related to the creation of generic platforms 
are also several standardisation activities. The 
World Wide Web consortium 6 (W3C) specifies 
markup languages for voice dialogues 
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5 http://www.speech.philips.com/ 

6 http://www.w3.org/voice/ 



(VoiceXML), speech recognition grammars, 
speech synthesis markup, reusable dialogue 
components, etc. ECTF 7 defines standards for 
interoperability in the Computer Telephony 
(CT) industry. 

The ATLAS framework, presented in detail 
in this paper, has so far been used in four 
projects at CTT. It was developed within the 
PER (Pakucs & Melin, to appear) and CTT-bank 
projects (Ihse, 2000; Melin et al. 2001). 
Demonstration systems created within these two 
projects, an automated entrance receptionist and 
a speech controlled telephone banking system, 
take advantage of most features in ATLAS. The 
platform has also been used within the Picasso 
Impostor Trainer project (Elenius, 2001) and the 
Horstod project (Johansson, to appear), where 
subsets of its features have been used. 

2. The system model 

ATLAS has been designed with the layered 
system model shown in Figure 1 in mind. The 
model has an application-dependent layer on 
top, a resource layer in the bottom, and an 
application-independent layer, the middleware, 
in between. The upper side of the middleware is 
a powerful speech technology application 
programming interface (API), and the lower side 
(as seen from above) is a collection of APIs to 
speech technology components in the resource 
layer. 

The middleware is itself layered. Each layer 
adds more powerful functionality and 
abstraction to the set of primitives that are 
offered in the speech technology API. For 
retained flexibility, the lower layers are always 
made available to the application through the 
API. 

ATLAS is first of all an implementation of 
the middleware illustrated in Figure 1, but it also 
contains foundation classes for the application 
layer. 

2.1. Terminology and notation 

When describing software structures in the 
following sections, we borrow terms from the 
object-oriented programming paradigm as used 
with the Java programming language. In this 
terminology, a class is a collection of data and 
methods that operate on that data. A class is 
usually created to specify the contents and 
capabilities of some kind of object. An object 
created from its class specification is called an 
instance, or simply an object A method is the 
object-oriented term for what is sometimes 
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Figure L The system model behind the ATLAS design. It is layered with an application-dependent 
layer on top, a resource layer in the bottom, and an application-independent layer in between. ATLAS 
is an implementation of the middle layer, and provides some support for implementing the top layer. 



called a procedure or a function. For example, a 
circle may be defined by a radius, a location, 
and a color. What we would like to do with a 
circle is perhaps to draw it, move it and 
calculate its area. With an object-oriented 
programming language we can then define the 
class Circle with attributes (data) radius, 
location and color, and methods draw, move and 
getArea. Once we have the class Circle we can 
create instances of it, i.e. create circle objects. 
Each circle object has its own radius, location 
and color, and can be drawn or moved 
individually. 

In this paper the word interface is used both 
in its general sense (for example: a human- 
machine interface, an application programming 
interface (API)) and in the object-orientation 
sense. In the latter case, an interface is a 
collection of methods and usually represent a 
certain aspect shared between classes. A class 
often implements several interfaces. In our 
example, the Circle class would perhaps 
implement interfaces Drawable, containing the 
method draw, and Movable, containing the 
method move. 

New concepts, especially method names, are 
set in italics when introduced in the text. 



3. The middleware 

In this section we exemplify the contents of the 
various layers of the middleware as 
implemented in ATLAS and illustrated in Figure 
1. We start at the top with the dialogue 
components layer and proceed towards the 
component APIs. 

3.1. Dialogue components 

A dialogue component is meant to be a powerful 
object that can solve a specific task within a 
dialogue with the user. The task can be to make 
a secure user login, to get the name of an 
existing bank account from the user, or to ask 
for a money amount. To solve such a task, a 
dialogue component must have some task- 
specific domain knowledge, such as knowing 
which customers exist and what accounts they 
have. The domain knowledge is often supported 
by an external database. Dialogue components 
should also be able to detect and recover from 
errors. An error may be an invalid response from 
the user such as the name of a non-existing 
account. If the user gives no response at all, or if 
he asks for help, the dialogue component should 
be able to provide useful help. As part of error 



recovery, the dialogue component may repeat or 
re-formulate a previously asked question. 

The purpose of the dialogue component layer 
is to allow a dialogue engine or the application 
programmer to delegate a well-defined task -to 
an existing component, and allow the re-use of 
components within and between applications. If 
no suitable component exists for a given task, 
the programmer may modify an existing 
component, create a new one, or choose to solve 
the problem in some other way. In creating 
modified or new dialogue components, the 
programmer has access to all the layers in 
ATLAS. Dialogue components are in concept 
very similar to Nuance' SpeechObjects (Nuance, 
2000) and Philips' SpeechBlocks. They also 
seem to be similar to dialogue agents in Jaspis 
(Turunen & Hakulinen, 2000). 

ATLAS itself currently contains only two 
types of dialogue components: login procedures 
and enrolment procedures. The task of a login 
procedure is to find out who the user claims to 
be, and then make sure the claim is valid. A 
login procedure is built from a set of login 
operations, each of which implements a part of 
the login procedure. The login procedure used in 
a normal CTT-bank session, for example, 
contains two login operations. The first is an 
identification operation that asks the user for his 
name and ID-number and then looks for a 
matching customer identity in a database. The 
second is a verification operation that prompts 
the user to utter a randomised password and 
checks the answer for the correct text and for the 
voice characteristics associated with the claimed 
identity. The login procedure used in the 
registration call to CTT-bank, on the other hand, 
contains a single login operation that performs 
both the identification and the verification 
function. This operation asks the user for a 
unique digit sequence issued to him when he 
was asked to make the registration call. 

While login operations implement the details 
of login, the login procedure itself adds 
procedural aspects, such as giving the user a 
certain number of attempts at a given operation. 
It also provides a single API to the dialogue 
engine or the application. An important point 
here is that it is easy for the application 
programmer to exchange one login procedure 
for another: it is just a matter of selecting 
another object for the task. 

The task of an enrolment procedure is to 
elicit speech from a customer, build a 
representation of the customer's speech, and 
store the representation in a database. In a CTT- 
bank registration call, a login procedure is first 
used to establish the caller's identity as a valid 



customer. An enrolment procedure is then used 
to collect ten utterances from the customer. The 
procedure checks that each utterance is spoken 
correctly and asks for a repetition if needed. 
When ten valid utterances have been collected, 
the procedure trains a speaker model for the 
customer's voice and stores it in a database. The 
same enrolment procedure is re-used in the PER 
demonstration system, only modified to exploit 
a graphical display for showing the user what 
utterances to speak. 

Within the CTT-bank application, another set 
of dialogue components has been developed. 
They all derive from the same component called 
"complex question", and their respective task is 
to get a money amount, to get the name of a 
valid account, and to get the answer to a yes/no- 
question. These components were created inside 
the application since they were not available in 
ATLAS, but they are candidates for being 
moved into ATLAS to make them easily 
accessible from other ATLAS-based applica- 
tions. 

3.2. High-level primitives 

The high-level primitives layer currently 
contains an ask method and a simplified ask 
method. Both methods present an optional 
prompt from a given prompt text and record and 
process the answer using a set of audio 
processors (defined in the next section). They 
normally depend on methods in the services- 
layer for their implementation such as say and 
listen (also defined and described in the next 
section). The simplified ask method returns the 
top-scoring text hypothesis for the spoken 
answer, while the ordinary ask method gives 
access to the results of all participating audio 
processors including multiple text hypotheses 
and speaker information. 

3.3. Services 

The services layer provides speech and media 
input and output capabilities through play, say 
and listen methods, plus specialised retrieval 
methods for speech technology components 
(resources) of pre-defined types. 

3.3.7. Speech and media output 
The play method loads media data from file, 
sends it to one or more media devices, and 
makes the media devices render it. The say 
method takes a text argument and sends it to a 
text-to-speech (TTS) component to generate a 
media stream. It then sends the generated media 
stream to one or more media devices like the 
play method. Note that both the play and the say 



methods can handle multi-modal media output 
devices, such as speech with face animation. In 
this case the generated media stream contains 
two channels, an audio channel and a channel 
with parameter data for face animation. 

3.3.2. Speech and media input 
The listen method is more complex than the play 
and say methods. Its task is to record a segment 
of audio from a media device and process it. The 
processing is done by an optional speech 
detector and zero or more audio processors. An 
audio processor is a speech recogniser, speaker 
verifier, or any other object that inputs audio and 
outputs a result. The configuration of speech 
detector and audio processors to be used by the 
listen method is defined by a listener profile, 
central to the design of the speech input 
mechanism. The listener profile can specify 
dependencies between audio processors, such 
that one processor may wait for the output of 
another processor and use it as input to its own 
processing. For example, a speaker verifier A 
may need the output of a particular speech 
recogniser B to segment an utterance and 
another speech recogniser C for deriving an 
identity claim (in the case when a single 
utterance is used both for identification and 
verification of an identity). A's dependency on 
B and C is then specified in the listener profile 
as A(B,C). In addition to audio processors given 
by the listener profile, the recorded audio 
segment can be saved to a file. 

The listen method is supported by three other 
methods: A preparatory method sets up media 
streams and prepares audio processors for a new 
utterance according to a listener profile. A call 
to the preparatory method is followed by a call 
to the listen method itself, that triggers the start 
of the actual recording (the "listening"). A group 
of methods can then be used to retrieve results 
from one or more of the audio processors. When 
asked for results from multiple audio processors, 
these methods do some data fusion. Result 
retrieval methods normally block until results 
from all audio processors are available. A 
maximum processing time can be specified, 
however. After this time has elapsed, a method 
will return with the results available at the time. 
When all the results have been retrieved, a 
clean-up method should be called to release 
resources allocated for the listen operation. 

3. 3. 3. Resource retrieval 

Specialised retrieval methods are provided for 
speech technology components (resources) of 
each pre-defined type. Pre-defined types are 
currently speech recognition engine, speaker 



verification engine, speech detector, text-to- 
speech engine, sound coder, media stream 
player, media stream recorder, graphical 
display, SQL database connection, and file- 
oriented database. Additional and more 
specialised types of media stream players and 
recorders have also been defined, including 
telephony device, desktop audio device, and 
audio-visual agent. Each resource retrieval 
method comes in two versions: one to retrieve 
the default resource of a given type and one to 
retrieve a named resource. 

3.4. Component interaction 

The component interaction layer contains 
resource handling, media stream connections, 
and several structures for representing various 
types of information. 

In resource handling, all components 
attached to ATLAS via a component API are 
abstracted to a resource, and are collected in a 
resource bundle. The life of a resource starts 
when it is created and ends when it is closed. 
While alive, its operation may be monitored to 
detect if the functionality is lost (the resource is 
down). Whenever the application or an object 
within ATLAS needs access to an attached 
component, it retrieves a handle to the 
component's API through the component's 
resource interface. This layer handles all 
resource types in the same way, while the 
services layer provides specialised retrieval 
methods for each resource type. 

A media stream consists of one or more 
TCP/IP-based media channels. The end-point of 
a channel is a TCP socket. By convention, the 
media producer connects to a server socket 
opened by the media consumer. When the 
connection has been established, the producer 
starts transmitting data in a format specified by 
the consumer. In most cases, the media stream 
has a single channel containing audio data. The 
only current example of a multi-channel stream 
in ATLAS is the stream from a text-to-speech 
synthesiser to an audio-visual agent, where a 
second channel contains parameter data for the 
face animation. Media streams are created on a 
per-utterance basis. 

Several types of information are passed 
between components, the ATLAS layers, and 
the application. The component interaction layer 
provides data structures to hold such 
information. An example is the utterance 
information structure that holds information 
about the contents of a spoken utterance. This 
may be the output of a speech recogniser and 
may be used as input by the application itself or 




by another audio processor, such as a speaker 
verifier or a parser. Currently the utterance 
information structure supports scored text 
hypotheses, word timing information, and 
speaker information, but could be extended to 
support for instance syntactic and semantic 
information. 

3.5. Component APIs 

A component API has been defined for each of 
the pre-defined resource types listed in section 
3.3. Some of the APIs are complex in that they 
are represented by several interfaces. The speech 
recogniser API, for instance, consists of a 
recogniser factory, a recogniser engine and a 
recogniser utterance. They are related in such a 
way that a factory creates engines, and engines 
process utterances (segments of audio data). 
Furthermore, the recogniser utterance interface 
uses the utterance information structures defined 
in the component interaction layer to represent 
its recognition results. The recogniser engine 
interface also extends the audio processor 
interface described in section 3.3. Similarly, the 
speaker verification API includes a verifier 
engine and a verification utterance. These are 
based on the SVAPI 8 standard speaker 
verification API. Besides the functionality 



8 SVAPI is a result of collaborative efforts of many 
companies, including Novell, Dialogic, IBM, 
Motorola and many others. 



covered by SVAPI, the speaker verification API 
in ATLAS has been extended to handle ATLAS- 
type media streams, and to have the verifier 
engine extend the audio processor interface. 

The TTS API also contains a factory 
interface and an engine interface. Utterances are 
handled with a method call in the TTS engine, 
rather than with a dedicated utterance object. 
The synthesis method and language are 
specified when a TTS engine is created and 
cannot be changed later. Voice properties for the 
selected synthesis method, such as pitch level, 
can be changed, however. An application can 
change voice or language by creating multiple 
TTS engines and switch between them. 

4. The resource layer 

As already mentioned, the resource layer refers 
to a collection of (speech technology) com- 
ponents used by an application. In this chapter 
we first elaborate on how components can be 
connected to ATLAS, and then list what 
components are currently available. Let us 
emphasise that the components themselves are 
not part of ATLAS, and that ATLAS is rather 
useless without a set of good components. 

4.1. Component implementation 

A component API, as the lower side of ATLAS, 
specifies how an application or an ATLAS layer 
can interact with a component, while leaving a 
lot of freedom for how the component is 



actually implemented. Since ATLAS is 
implemented in Java and the component APIs 
are defined in terms of Java APIs, the 
component as such must be a Java object. But 
what if we already have a speech recogniser 
engine written in, for instance, C++? Then we 
can create a pseudo-implementation of the 
engine in Java that uses the existing C++ 
program to do the actual job. Figure 2 illustrates 
four examples of how a speech recognition 
engine (labelled ASR in the figure) can be 
connected to ATLAS through the component 
API. 

In the first example, the engine already has a 
Java implementation of the component API. 
Either the engine is coded in Java or it is coded 
in a native language (C/C++) but has a Java 
wrapper using the Java Native Interface, JNI. 

In the second example, the engine supports 
another API than ATLAS' component API. This 
may be an industry standard API, such as 
Microsoft's SAPI 9 or Sun's JSAPI 10 , or an 
engine vendor-specific API (for example 
Philips 11 API). Provided the ATLAS API can be 
mapped to the other API, a pseudo- 
implementation of the ATLAS API could be 
created that operates as a bridge between the 
two APIs. Such a bridge can possibly be used 
with other engines that support the same 
standard API. 

In the previous two examples, the engine is 
likely to execute in the same process as ATLAS 
itself, while in the remaining two examples the 
engine may be implemented as a server in a 
separate process. The third example illustrates a 
plain server implementation, where a small 
pseudo-implementation of the ATLAS API 
communicates with the server through some 
inter-process communication mechanism, such 
as the Common Object Request Broker 
Architecture 12 (CORBA), Java Remote Method 
Invocation 13 (RMI), or the CTT Broker 14 . 

The fourth and final example in Figure 2 
indicates the possibility to interface to an engine 
that is integrated into another speech technology 
system, such as the DARPA Communicator . 
This could include interfacing several other 
Communicator engines (text-to-speech engine, 
parser, etc.) at the same time through a single 
bridging mechanism. Alternatively, each single 
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engine could be attached directly, like in 
examples two and three. 

4.2. Available components 

In this section we list the currently available 
components that implement an ATLAS 
component API and thus can be used with 
ATLAS. We pay special attention to how each 
component is connected to ATLAS and give 
references for the underlying technology, but 
otherwise keep descriptions very brief. More 
detailed descriptions for some of the 
components can be found in Melin et al. (2001). 

Three components are available as internal 
resources executing in the same virtual machine 
as ATLAS (Figure 2, example 1): an energy and 
zero-crossing rate based speech detector, a 
sound coder and a file-oriented database. (The 
two latter components are also available as CTT 
Broker servers, see below.) 

Two components use an industry standard 
API (JDBC that has been chosen to be the 
component API for SQL databases in ATLAS) 
to access an SQL database (Figure 2, example 
2): one interfaces to a MySQL 16 database and 
the other to a Borland InterBase 17 database. Note 
that these ATLAS components add very little to 
the JDBC driver itself; it merely defines the 
name of the driver and loads the driver into the 
virtual machine. 

The remaining components are implemented 
as clients to CTT Broker servers, (Figure 2, 
example 3), including: 

• a text-to-speech component using RULSYS 
(Carlson et al., 1982) for text-to-phone 
conversion plus GLOVE (Carlson et al., 
1990) or MBROLA (Dutoit et al., 1996) 
synthesisers. Several Swedish and English 
voices are currently available, including 
Lukas (Tilipson & Bruce, 1997) and the 
Infovox 8 voices Ingmar, Annmarie and 
Roger. It can generate media streams for 
multi-modal output (face and voice) 
(Beskow, 1995). 

• a StarLite speech recogniser (Strom, 1996). 
Acoustic triphone models trained on 
SpeechDat databases (Hoge et al., 1997; 
Elenius, 2000) are available for Swedish and 
English (Salvi, 1998; Lindberg et al., 2000). 

• a speaker verifier based on GIVES. Text- 
dependent modes for Swedish and English 
are available (Melin & Lindberg, 1999; 
Melin, to appear). 
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• a media device with animated agent output 
and audio-only input (simplex mode) 
(Beskow, 1995; Gustafson et al., 1999). 

• a media device, "digitiser", with desktop 
(simplex) audio output and input based on 
the Snack toolkit (Sjdlander & Beskow, 
2000). 

• an ISDN media device with telephony call 
handling and (simplex) audio output and 
input. 

• a sound coder component. Performs audio 
format conversion, speech parameterisation 
for the speech recogniser and can fork audio 

, streams. Also available as an internal 
component. 

• a file-oriented database. Also available as an 
internal component. 

In addition to the above CTT Broker-based 
components, a registry component interfaces a 
registry in the Broker that keeps track of host- 
specific servers. 

The text-to-speech, speech recognition, 
agent, digitizer, sound coder and file database 
Broker servers were originally developed as part 
of other projects at CTT. They have recently 
been improved and adapted to work well with 
ATLAS. 

4.3. The CTT Broker 

The CTT Broker (Lewin, 1997) is an 
architecture for inter-process communication 
that is helpful when building modular and 
distributed systems. It was initially developed 
within the ENABL (Bickley & Hunnicut, 2000) 
and August (Gustafson et al., 1999) projects. 
The Broker is used by ATLAS to communicate 
with several speech technology components 
implemented as Broker servers, as indicated 
above. It is currently also used in the AdApt 
dialogue system (Gustafson et al., 2000), where, 
on the other hand, ATLAS itself is not used. 

The primary function of the Broker is to pass 
message strings between servers through TCP 
ports. To manage this, it also keeps track of 
what servers are connected. The basic, 
lightweight protocol uses a short header for the 
Broker's own use attached to the actual message 
string. The header includes a message type 
indicator and address information, where 
message types exist to connect and disconnect a 
server, to send procedure or function calls to a 
server, and to send a return or error value in 
response to a function call. It is up to each server 
to define syntax and semantics for the actual 
message strings - the Broker simply passes this 
string from sender to receiver without 



interpreting or altering it. The string based 
message protocol and the use of TCP port based 
connections make the operation of the Broker 
platform independent, in that servers can run in 
any programming environment and operating 
system that supports TCP connections. The 
Broker itself is implemented in Java and can 
therefore run on any platform that supports Java. 

A secondary function of the Broker is to start 
servers on demand, and to detect when a server 
is closed. It uses a database of startable servers 
that defines what servers can be started and how 
they are started. 

To aid the creation of servers, software 
libraries have been created for several 
programming languages, including Java, C, 
C++, Tel, Perl and Prolog. With these libraries a 
server can register itself with the Broker and 
make calls to a remote server using constructs in 
the used programming language, rather than 
handling low-level TCP connections directly. 
For example, with the Java library a server 
creates an instance of the BrokerClient class and 
calls the instance's connect method. It can then 
make remote calls to another server by calling a 
method callFunc, giving the name of the remote 
server and the message string as arguments. The 
callFunc method blocks until the Broker sends a 
reply and then either returns a value or throws 
an exception. 

In addition to the basic call functionality, 
some of the libraries (currently Java, C++ and 
Tel) provide a parser for the contents of a 
message string that can route calls to classes, 
instances and attributes inside a server. 
Language constructs are also available to 
represent classes and instances in the server. 
With this mechanism, the concept of remote 
objects is supported. The remote object concept, 
the parser, and the corresponding message 
structure are entirely optional, but are used by 
all servers currently available through the 
ATLAS platform. 

Using the remote objects concept, an event 
mechanism has been implemented using a 
publication metaphor. A server creates a 
publication for publishing certain information, 
and servers subscribing to the publication gets 
an update message every time new information 
is published. This event mechanism is for 
instance used by the Broker itself to make server 
connection status information available. By 
subscribing to such a publication, an application 
can for instance know when a server is lost. 
ATLAS uses this feature with all Broker-server 
based resources. When a Broker-server based 
resource is created, ATLAS automatically 
subscribes to status information for the 



Figure 3. Application structure implemented by ATLAS application support classes. The application 
object creates sessions upon incoming calls from a terminal The Control Center GUI is optional and 
any application can run without it. 



corresponding server connection. ATLAS is 
then notified if the server is lost, and can take 
measures to for instance re-create the resource. 
The event mechanism is currently available in 
the Java and C++ libraries only. 

The CTT Broker architecture has similarities 
with other inter-process communication 
architectures 19 . The Galaxy-II hub, for instance, 
also organises servers in a star topology where 
all server-to-server messages pass through the 
hub. The hub has a programmable controller 
function, however, that the CTT Broker has not. 
CORBA and Java RMI provides support for 
manipulating remote objects almost as if they 
were local objects. Similar functionality can be 
achieved with the Broker and its support 
libraries. It is left to the Broker server designer, 
however, to provide client-side APIs that allow 
remote objects to be manipulated as if they were 
local objects. Such client-side APIs (stubs) are 
generated automatically with CORBA and Java 
RMI. 

Audio and other binary streams are usually 
not sent through the CTT Broker, Instead, 
servers communicate through the Broker to 
setup direct connections where binary data is 
transmitted. This is the same as in Galaxy-II. 



19 URL references for CORBA, Java RMI and 
Galaxy-II were given in section 4.1. 



5, Applications 

5.1. Application support 

Apart from providing an implementation of the 
middleware illustrated in Figure 1, ATLAS 
provides a set of support classes for the 
application-dependent layer of a system. This 
includes interfaces and super classes for 
application, session and terminal classes. 

The provided super classes can be used to 
create applications with the structure illustrated 
in Figure 3. The idea is that the application 
corresponds to an object that is created once. 
The application can then create session objects 
whose lives correspond to physical sessions of 
interaction with a user through a terminal. It is 
usually the session object that does something 
interesting using the speech technology API in 
ATLAS. The current implementation limits the 
number of concurrent sessions to one, for 
reasons of simplicity. We believe this to be 
sufficient for most research situations. 

Each session object is connected to a single 
terminal. A terminal may be telephony based, in 
which case the session naturally corresponds to 
a telephone call, or desktop based. With a 
desktop-based terminal the session metaphor 
does not come as naturally as with a telephony- 
based terminal. It is left to the implementation of 
the terminal object to decide when a session 
should start, and to the terminal or the session to 



decide when a session should be terminated. 
Common to all terminals is that they provide a 
means for initiating a session with the 
application, and that they are associated with an 
audio output ahd an audio input device. 
Optionally, a terminal may also have a means to 
close a session and may be associated with a 
display. 

One of the key features of ATLAS and the 
arrangement illustrated in Figure 3 is that 
applications and sessions can interact with any 
type of terminal transparently (as long as they 
do not require particular properties of specific 
terminals). In CTT-bank, for example, a session 
normally interacts with the user via a telephony- 
based terminal, but it can also use a desktop- 
based terminal. In fact, this was often exploited 
during the development phase of the project. 
The desktop terminal could even be extended 
with output through an audio-visual agent. To 
take full advantage of multi-modal output, 
however, the session code needs to be modified 
to send requests for animated gestures to the 
agent, to make the agent look more alive. With 
such modifications the session would still run 
with a telephony-based terminal, since gesture 
requests are simply discarded if the terminal 
cannot visualise them. 

Besides sending audio through the audio 
output device associated with the terminal 
connected to^a given session, the application or 
the session may choose to add other output 
devices. This is exploited in PER, where system 
output is always sent to the audio-visual agent 
sitting by the gate, even if the current session 
interacts with a user through telephone. This is 
to indicate to a newly arrived person at the gate 
that the system is currently busy talking to 
somebody else. 

ATLAS has been internationalised 20 with 
respect to the language spoken within the 
application. That is, assuming the application 
dependent part of an application is also 
internationalised, the application can be 
localised 21 to a new language. Localisation in 
this case involves translating text elements 
related to generating system prompts and 
interpreting user responses, and adding 
resources for the new language (or making sure 
the existing ones support the new language). All 
such text elements within ATLAS, i.e. in its 



20 Internationalisation is the process of designing an 
application so that it can be adapted to various 
languages and regions without engineering changes. 

21 Localisation is the process of adapting software for 
a specific region or language by adding locale- 
specific components and translating text. 



dialogue components, have been localised to 
Swedish and English, and both languages are 
supported among speech technology com- 
ponents listed in section 4.2. Basically, 
internationalisation means separating text 
elements from program code. Text elements are 
stored in text files that are read by the program 
at runtime. There is one or more text files per 
language, and text elements are localised by 
creating a corresponding set of text files for the 
new language. 

An important additional advantage of 
separating text elements from program code is 
that changing system prompts, and especially 
hand tuning prompts for optimal synthesis 
output, requires no knowledge of the 
programming language used to code the 
application. Having all prompt texts collected in 
one place also provides for a good overview. 

A graphical user interface (GUT) to the 
application and the resource bundle is provided. 
It is called the Control Center and is entirely 
optional - any application can run without it. 
The application part of the GUI provides a 
possibility to start and stop the application. It 
also has a message pane that shows when the 
application was started or stopped, and when 
sessions are created and ended. The resource 
bundle part of the GUI provides possibilities to 
select the current language and to change default 
resources of each type for each language. The 
latter facility enables the operator to, for 
instance, select a TTS engine with another voice 
to be the default. This effectively changes the 
voice of the application, given that the 
application is coded to use the default TTS 
engine. The resource bundle GUI also has a* 
message pane that logs resource status 
information. 

5.2. Examples 

In this section we shortly describe four systems 
that use ATLAS, as examples of how the 
platform can be used. For each system, we 
explain its task, what has been coded in its 
application-dependent layer, which ATLAS 
layers are used, and what speech technology 
components are included in the resource layer. 

All four systems have in common that their 
application-dependent layer is coded in Java and 
that it uses ATLAS application support classes 
to implement application and session classes. 

5.2.7. CTT-bank 

CTT-bank is a speech controlled telephone 
banking system (Ihse, 2000; Melin et al., 2001). 
Customers identify themselves to the system by 



saying their name and a short digit sequence. 
The digit sequence is chosen by the customer 
himself during registration, and is used to make 
the identification phrase unique. After claiming 
an identity, he verifies the claim by repeating a 
four-digit password generated by the system. 
Once allowed access to the system, the user can 
check account balance, list recent transactions, 
and transfer funds between accounts. 

The application-dependent layer defines 
several dialogue components to implement the 
banking services and part of the registration 
dialogue. Dialogue components use methods 
and objects in various ATLAS layers for their 
implementation. ATLAS dialogue components 
for enrolment and login are extended and 
specialised, and used to implement registration 
and user authentication dialogues. Specialisation 
includes using an error-correcting code with a 
seven-digit registration number used to 
authenticate the user during the registration call, 
and changing prompt texts to fit the application. 

The resource bundle contains a speaker 
verification engine, several speech recognition 
engines, several text-to-speech engines, a speech 
detector, two ISDN terminals, a desktop-based 
terminal, a sound coder, a file-oriented database 
and a MySQL database driver. The multitude of 
speech recognition engines is needed because 
the used speech recogniser does not support 
online grammar modification. One engine is 
therefore created for each specialised grammar 
used in the application (Melin et al., 2001). 

5.2.2. PER 

The PER system (from Prototype Entrance 
Receptionist) is installed at the central entrance 
of the Department of Speech, Music and 
Hearing. The system in its current state is 
basically a voice-operated lock: employees at 
the department may open the door by saying 
their name followed by a random digit sequence 
displayed on a screen. Speech recognition and 
speaker verification is used to authenticate the 
user, and an animated agent gives the user 
feedback by greeting him or asking him to try 
again. The physical installation includes a 
screen, a high-quality microphone, a relay to 
unlock the door, and several sensor devices to 
detect the presence of a person. In the future, the 
system will be extended to engage in dialogue 
with visitors to provide assistance. 

The system was initially developed as part of 
a student project (Armer6n, 1999). It has 
recently been re-designed to employ the ATLAS 
platform, and several improvements have been 
made (Pakucs & Melin, to appear). 



Most of the current dialogue is implemented 
by ATLAS dialogue components for enrolment 
and login. For future development, a general- 
purpose dialogue manager, SESAME, will be 
added on top of ATLAS. The current application 
has been localised to Swedish and English. It is 
not a prioritised task, however, to keep system 
extensions, such as more advanced language 
understanding and dialogue control, bilingual. 

The resource bundle contains one speaker 
verification engine, two speech recognition 
engines, and several text-to-speech engines per 
supported language. It also contains an animated 
agent, a graphical display, a speech detector, 
two ISDN terminals, one terminal object per 
detector, a sound coder, a file-oriented database 
and a MySQL database driver. Several of the 
resources are created especially for this 
application, including the display that presents 
the random password on the screen, and 
detectors with drivers to sensor devices. 

Regarding the session metaphor used in 
ATLAS application support classes, each 
terminal object (except the telephone-based) 
uses a detector to decide when to trigger the start 
of a new session in the application. It is then up 
to the session logic to decide when the session 
has finished, i.e. when a user has left. 

5. 2. 3. Picasso Impostor Trainer 
The Picasso Impostor Trainer system was 
developed for a study on speakers' ability to 
imitate other speakers (Elenius, 2001). A subject 
calls the system while sitting in front of a 
computer. He speaks a list of digit sequences to 
provide a sample of his normal voice, and the 
system compares the voice to a list of speaker 
models and selects two target speakers for 
imitation. The subject can then interactively 
practice to imitate a target speaker under 
controlled conditions: by listening to recordings 
of the target speaker, by watching a display with 
scores from the speaker verification system for 
his own practice utterances tested against the 
target speaker's model, and combinations of the 
previous two. After each training round, the 
subject speaks ten utterances without feedback 
to test if he is able to alter his voice to get 
"closer" to the target speaker in the sense of the 
speaker verification system. 

The application-dependent layer uses no 
dialogue components. Instead, the application 
uses the listen method in the ATLAS services 
layer to input and process utterances, and the 
play method to play back pre-recorded samples 
of target speakers. It has an elaborate GUI for 
system feedback to the user and for mouse input. 



The resource bundle contains a speaker 
verification engine, a speech recognition engine, 
a speech detector, a sound coder, a media file 
database, and an ISDN terminal. 

5.2.4. Horstod 

This system was developed for investigating if a 
hearing impaired person can be aided by 
transcriptions produced by a phoneme 
recogniser in understanding speech during a 
telephone conversation (Johansson, to appear). 

The application-dependent layer is fairly 
similar in content to the Picasso Impostor 
Trainer. It uses a telephony terminal for audio 
-input and a GUI for graphical output. It uses 
ATLAS high-level primitives layer to input and 
process utterances, and no dialogue components 
are used. The resource bundle contains the same 
resources as in Impostor Trainer, except that no 
speaker verifier or speech detector are included. 

6. Discussion 

The main difficulty in creating a generic 
application platform is to make it both efficient 
and flexible. For a given application, the 
platform should be efficient to use to minimise 
development costs. But it should at the same 
time be flexible enough to be efficient for 
another type of application, and to allow 
adaptation to new types of applications. We 
believe that the layered structure employed in 
ATLAS is powerful in this regard. By providing 
low-level APIs and structures, where little is 
assumed about the overall application structure, 
we allow very diverse applications to share at 
least the same speech technology components. 
In the higher, more powerful layers, we assume 
more and more about application structure. 
These layers are efficient to use with 
applications for which these assumptions are 
valid, and may simply be ignored by 
applications for which they are not. To develop 
the platform to provide powerful layers also to 
new and diverse applications, we can either 
adapt the existing layer implementations and 
generalise them, or we can create parallel 
implementations with other assumptions 
regarding system structure. As a development 
process, we suggest to first develop new 
applications with whatever parts of ATLAS are 
useful, then to analyse the application-specific 
code to see what is general and what is specific 
to the particular application, and finally to 
successively move the general parts into 
ATLAS. This is how ATLAS can evolve with 
research advances. 



VoiceXML ^ is emerging as a standard 
markup language for representing human- 
computer dialogues. To relate ATLAS and 
VoiceXML to each other, we first try to describe 
the latter in the context of the system model 
illustrated in Figure 1. The VoiceXML standard 
primarily defines a specification for the interface 
between the application-dependent layer and a 
voice browser. The application-dependent layer 
in a VoiceXML application is very "thin" and is 
represented by a set of XML documents. The 
voice browser is an application-independent 
engine that implements dialogues according to 
given VoiceXML documents. It thus includes 
the functionality of the middle layer and the 
resource layer of Figure 1 (though it may have 
an entirely different structure). We therefore 
suggest that VoiceXML corresponds to the 
speech-technology API in ATLAS. 

While ATLAS gives the application 
programmer access to all its internal layers for 
retained flexibility, VoiceXML provides access 
to rather high-level functionality in the voice 
browser, but not to low-level details. For 
instance, a VoiceXML application can tell the 
browser to ask a multiple-choice question, but 
cannot manipulate the voice browser's speech 
recogniser directly (except possibly through a 
browser vendor's proprietary features). 
VoiceXML has been created based on some 
assumptions about system structure and 
capabilities, and as a standard it also imposes 
corresponding constraints on what applications 
can be created. It is therefore efficient for those 
kinds of applications. Because VoiceXML is a 
standard, a VoiceXML application can also be 
executed in a standard-compliant voice browser 
from any vendor 24 . 

Creating media streams on a per-utterance 
basis allows demands for real-time performance 
in some parts of the system to be reduced, 
compared to if a continuous (unbuffered) media 
stream on a central bus were used. This is an 
advantage in a research system since it allows, 
for instance, an experimental speech recogniser 
to take the time it needs to prepare for and 
process an utterance. A slow component need 
not risk that other components involved in the 
processing of the same utterance looses any 
samples. We also believe per-utterance streams 
make system programming somewhat easier. It 
has a couple of disadvantages, however. Setting 



http://www.w3.org/voice/ 

23 http://www.voicexmI.org/ 

24 Several vendors market voice browsers that 
implement VoiceXML, including Tellme, Motorola, 
Nuance and Pipebeach. 



up streams for every new dialogue turn or 
utterance takes more time than simply telling, a 
device to start listening on an already connected 
media bus, possibly resulting in a slower system. 
It may also make it more difficult to use full 
duplex input/output streams, to implement 
barge-in, etc. Most standard APIs related to 
audio and speech also tend to assume a central 
media bus. Thus we consider using continuous, 
buffered media streams for the future. 

For ATLAS component APIs we have in 
general not used public standard APIs (the 
exceptions are JDBC for SQL database 
connections and SVAPI for the speaker 
verification). This is because, first of all, for 
most current component implementations we 
use in-house technology developed before 
ATLAS was conceived, and we chose to design 
APIs that match the abilities of the current 
technology. Implementing a standard API, such 
as the Java Speech API (JSAPI), for the ITS for 
instance, would have resulted in overhead work 
at this stage. Second, the main candidate for a 
standard API in ATLAS would be JSAPI, and 
there are not (yet) many speech engines 
available that implement JSAPI. Furthermore, 
JSAPI in its current state does not integrate well 
with the corresponding Java Sound API and 
Java Telephony API that would enable us to 
maintain the audio device independence of 
ATLAS. Third, as we outlined in section 4.1, an 
ATLAS component API can be mapped to a 
standard API via a bridge to enable the use of 
engines with a standard API. 

A corresponding division between internal 
and standard APIs is seen in Jaspis (Turunen & 
Hakulinen, 2000). In its input/output 
architecture, virtual devices are abstract units 
that represent more concrete engines. Virtual 
devices serve as the interface between engines 
(below) and agents and the communication 
manager (above) and partly correspond to 
ATLAS component APIs. Below the virtual 
device level in Jaspis are the client, server and » 
engine levels, and standard APIs are employed 
between the server and engine levels (cf Figure 
2, example 2). 

7. Conclusions 

ATLAS has been presented as a framework for 
building demonstration applications with speech 
technology. So far it has proved useful for 
research in four CTT projects. The CTT-bank 
system has been used both in a usability study 
and for collecting data for evaluation of speech 
recognition and speaker verification perform- 
ance. The Horstod system will likewise be used 



in a usability study and to test the performance 
of a phoneme recogniser. The Picasso Impostor 
Trainer was used to test how speakers are able to 
imitate other people's voices. The PER system 
has been used to test speaker verification 
performance, and will be a platform for further 
exploitation of speech technology within CTT 
and our host department Speech, Music and 
Hearing (TMH). 

The high-level speech technology API and 
the application support classes in ATLAS make 
application building easier compared to 
programming with speech technology 
components directly. Three of the current 
ATLAS applications (CTT-bank, Picasso 
Impostor Trainer and Horstod) were created as 
part of student projects. The platform has thus 
proved to be useful also for educational 
purposes. 

The future development of the platform will 
include added support for natural language 
processing, such as message generation and 
parsing, and interfacing to other speech 
recognisers, for example ACE 25 (Seward, 2000) 
developed at CTT. More technical developments 
are added logging facilities and support for re- 
creating lost server-based resources for 
increased stability. 

ATLAS is currently developed and used only 
within CTT, but we see a possibility to make it 
publicly available in the future. 

8. Acknowledgements 

This research was carried out at the Centre for 
Speech Technology (CTT), a competence centre 
at KTH, supported by VINNOVA (The Swedish 
Agency for Innovation Systems), KTH and 
participating Swedish companies and 
organisations. 

9. References 

Armeren E (1999). Site access controlled by speaker 
verification. MSc Thesis, TMH, KTH, Stockholm, 
June. 

Bayer S, Doran C & George B (2001). Dialogue 
interaction with the DARPA Communicator 
infrastructure: the development of useful software. 
Notebook proceedings, First International 
Conference on Human Language Technology - 
Research, San Diego, California, March 18-21, 
179-181. 

Beskow J (1995). Rule-based visual speech 
synthesis. Proc ofEurospeech, Madrid, Sept 18-21, 
2:299-302. 

Bickley C & Hunnicutt S (2000). ENABL - Enabler 
for engineering software using language and 



http://www.speech.kth.se/ace/ 



speech. TMH-QPSR, KTH, Stockholm, 1/2000:1- 
11. 

Carlson R, Granstrom B & Karlsson I (1990). 
Experiments with voice modeling in speech 
synthesis. Proc pf ESCA workshop on Speaker 
Characterization in Speech Technology, June 26- 
28, Edinburgh, also Speech Communication 
10:481-489. 

Carlson R, Granstrom B & Hunnicutt S (1982). A 
multi-language text-to-speech module. Proc of 
ICASSP, Paris, 3:1604-1607. 

Dutoit T, Pagel V, Pierret N, Batialle F & van der 
Vreken O (1996), The M^ROLA project: towards 
a set of high-quaflity speech synthesizers free of use 
for non-commercial purposes. Proc of ICSLP, 
Philadelphia, 3:1393-1396. 
-Elenius D (2001). Hamming - ett hot mot 
talarverifieringssystem? MSc Thesis (in Swedish), 
TMH, KTH, Stockholm, February. 

Elenius K (2000). Experiences from collecting two 
Swedish telephone speech databases. International 
Journal of Speech Technology, 3:119-127. 

Filipsson M & Bruce G (1997). LUKAS-A 
preliminary report on a new Swedish speech 
synthesis. Working Papers, Dept of Linguistics, 
Lund University, Lund, Sweden, 46:47-56. 

Gustafson J, Bell L, Beskow J, Boye J, Carlson R, 
Edlund J, Granstrom B, House D & Wiren M 
(2000). AdApt - a multimodal conversational 
dialogue system in an apartment domain. Proc of 
ICSLP, Beijing, Oct 16-20, 2:134-137. 

Gustafson J, Lindberg N & Lundeberg M (1999). 
The August spoken dialogue system. Proc of 
Eurospeech, Budapest, Sept 5-9, 3:1151-1154. 

Hoge H, Tropf HS, Winski R, van den Heuvel H, 
Haeb-Umbach R & Choukri K (1997). European 
speech databases for telephone applications. Proc 
of ICASSP, Munich, Apr 21-24, 3:1771-1774. 

Ihse M (2000). Usability study of a speech controlled 
telephone banking system. MSc Thesis, TMH, 
KTH, Stockholm, September. 

Johansson M (to appear). Phoneme recognition as a 
hearing aid in telephone communication. MA 
Thesis, Dept. of Linguistics, Uppsala University, 
Uppsala, Sweden. 

Lewin E (1997). The Broker architecture at TMH. 
http://www.speech.kth.se/broker. 

Lindberg B, Johansen F T, Warakagoda N, Lehtinen 
G, Kacic Z, Zgank A, Elenius K & Salvi G (2000). 
A noise robust multilingual reference recogniser 
based on SpeechDat(II). Proc of ICSLP, Beijing, 
Oct 16-20,3:370-373. 

Melin H & Lindberg J (1999). Variance flooring, 
scaling and tying for text-dependent speaker 
verification. Proc of Eurospeech, Budapest, Sept 5- 
9,5:1975-1978. 

Melin H, Sandell A & Ihse M (2001). CTT-bank: A 
speech controlled telephone banking system - an 
initial evaluation. TMH-QPSR, KTH, Stockholm, 
1:1-27. 

Melin H (to appear). Automatic speaker verification 
in telephone banking: a field test. 



Nuance (2000). SpeechObjects: An architectural 
overview. White Paper, Nuance, Menlo Park CA, 
USA, http://www.nuance.eom//" ' 

Pakucs B & Melin H (to appear). PER - A speech 
based automated entrance receptionist. Proc of 
NODALIDA, Uppsala, Sweden, May 21-22. 

Potamianos A, Kuo H-K, Lee C-H, Pargellis A, Saad 
A & Zhou Q (1999). Design principles and tools 
for multimodal dialog systems. Proc of ESCA 
Workshop on Interactive Dialogue in Multi-Modal 
Systems, Kloster Irsee, Germany, June 22-25, 169- 
172. 

Salvi G (1998). Developing acoustic models for 
speech recognition. MSc Thesis, TMH, KTH, 
Stockholm, October. 

Seneff S, Hurley E, Lau R, Pao C, Schmid P & Zue 
V (1998). Galaxy-H: A reference architecture for 
conversational system development. Proc of 
ICSLP, Sydney, Nov 30-Dec 4, 3:931-934. 

Seward A (2000). A tree-trellis n-best decoder for 
stochastic context-free grammars. Proc of ICSLP, 
Beijing, Oct 16-20, 4:282-285. 

Sjolander K & Beskow J (2000). WaveSurfer - an 
open source speech tool. Proc of ICSLP, Beijing, 
Oct 16-20, 4:464-467. 

Strom N (1996). Continuous speech recognition in 
the WAXHOLM dialogue system. STL-QPSR, 
KTH, Stockholm, 4/1 996:67-96. 

Sutton S, Cole R, Villiers J, Schalkwyk J, Vermeulen 
P, Macon M, Yan Y, Kaiser E, Rundle B, Shobaki 
K, Hosom P, Kain A, Wouters J, Massaro D & 
Cohen M (1998). Universal speech tools: the 
CSLU toolkit. Proc of ICSLP, Sydney, Nov 30- 
Dec 4, 7:3221-3224. 

Turunen M & Hakulinen J (2000). Jaspis - a 
framework for multilingual adaptive speech 
applications. Proc of ICSLP, Beijing, Oct 16-20, 
2:719-722. 



PALM Resource Center 



13 Records Were Found 



.4 



'qMryljy ot^anfeatiort'' 



Page 1 of 1 



Employee 



Office Building FI.-Ste,/Corr.-Rm Contact No. Type Ext 



am mpmi^ at ampm*^ t /cdp^ 

AIM l v IEI\IU Mi {J v IEI>JoJ 1 ^orCJ 


D/T 1 1G. 
VI Z.1Z.O 




UD/AU1 


( /Uj JJllD-yD/o 


I 


AMYA f~HARI FQ c 
nH In CI1Ai\LLJ L 


r/ /. J.ZD 




Uj/UUj 




T 
1 


bKUob tUWAKU (JUbbrM) J 


P/zlzb 


ni/T 

PK2 


03/C20 


(703)305-8754 


T 


BULLOCK JR LEWIS A 


P/2126 


PK2 


05/B52 


(703)305-0439 


T 


CAO DIEM K 


P/2126 


PK2 


05/A06 


(703)305-5220 


T 


COURTENAY III ST JOHN ( JOHN) 


P/2126 


PK2 


05/D42 


(703)308-5217 


T 


HO THE T 


P/2126 


PK2 


05/S01 


(703)306-5540 


T 


HOANG PHUONG N 


P/2126 


PK2 


05/A10 


(703)605-4239 


T 


LAO SUE X 


P/2126 


PK2 


05/A13 


(703)305-9657 


T 


NGUYEN VAN H 


P/2126 


PK2 


05/D51 


(703)306-5971 


T 


OPIE GEORGE L 


P/2126 


PK2 


05/A02 


(703)308-9120 


T 


TRUONG LECHI 


P/2126 


PK2 


05/T01 


(703)305-5312 


T 


ZHEN LI B 


P/2126 


PK2 


05/U01 


(703)305-3406 


T 



Contact Number Type: T - Telephone, F - Fax, R - Receptionist, P - Pager, M - Mobile 

Employee Search Completed 
No more records to search 



7f? 3lo 



h e f eg b 



